Publications

Displaying 101 - 200 of 231
  • Kempen, G., & Harbusch, K. (1998). A 'tree adjoining' grammar without adjoining: The case of scrambling in German. In Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4).
  • Kempen, G., & Harbusch, K. (2004). How flexible is constituent order in the midfield of German subordinate clauses? A corpus study revealing unexpected rigidity. In S. Kepser, & M. Reis (Eds.), Pre-Proceedings of the International Conference on Linguistic Evidence (pp. 81-85). Tübingen: Niemeyer.
  • Kempen, G. (2004). Interactive visualization of syntactic structure assembly for grammar-intensive first- and second-language instruction. In R. Delmonte, P. Delcloque, & S. Tonelli (Eds.), Proceedings of InSTIL/ICALL2004 Symposium on NLP and speech technologies in advanced language learning systems (pp. 183-186). Venice: University of Venice.
  • Kempen, G., & Harbusch, K. (2004). How flexible is constituent order in the midfield of German subordinate clauses?: A corpus study revealing unexpected rigidity. In Proceedings of the International Conference on Linguistic Evidence (pp. 81-85). Tübingen: University of Tübingen.
  • Kempen, G. (2004). Human grammatical coding: Shared structure formation resources for grammatical encoding and decoding. In Cuny 2004 - The 17th Annual CUNY Conference on Human Sentence Processing. March 25-27, 2004. University of Maryland (pp. 66).
  • Kempen, G. (1994). Innovative language checking software for Dutch. In J. Van Gent, & E. Peeters (Eds.), Proceedings of the 2e Dag van het Document (pp. 99-100). Delft: TNO Technisch Physische Dienst.
  • Kempen, G. (1994). The unification space: A hybrid model of human syntactic processing [Abstract]. In Cuny 1994 - The 7th Annual CUNY Conference on Human Sentence Processing. March 17-19, 1994. CUNY Graduate Center, New York.
  • Kempen, G., & Dijkstra, A. (1994). Toward an integrated system for grammar, writing and spelling instruction. In L. Appelo, & F. De Jong (Eds.), Computer-Assisted Language Learning: Proceedings of the Seventh Twente Workshop on Language Technology (pp. 41-46). Enschede: University of Twente.
  • Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In Gesture and Sign-Language in Human-Computer Interaction (Lecture Notes in Artificial Intelligence - LNCS Subseries, Vol. 1371) (pp. 23-35). Berlin, Germany: Springer-Verlag.

    Abstract

    The previous literature has suggested that the hand movement in co-speech gestures and signs consists of a series of phases with qualitatively different dynamic characteristics. In this paper, we propose a syntagmatic rule system for movement phases that applies to both co-speech gestures and signs. Descriptive criteria for the rule system were developed for the analysis video-recorded continuous production of signs and gesture. It involves segmenting a stream of body movement into phases and identifying different phase types. Two human coders used the criteria to analyze signs and cospeech gestures that are produced in natural discourse. It was found that the criteria yielded good inter-coder reliability. These criteria can be used for the technology of automatic recognition of signs and co-speech gestures in order to segment continuous production and identify the potentially meaningbearing phase.
  • Klatter-Folmer, J., Van Hout, R., Van den Heuvel, H., Fikkert, P., Baker, A., De Jong, J., Wijnen, F., Sanders, E., & Trilsbeek, P. (2014). Vulnerability in acquisition, language impairments in Dutch: Creating a VALID data archive. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 357-364).

    Abstract

    The VALID Data Archive is an open multimedia data archive (under construction) with data from speakers suffering from language impairments. We report on a pilot project in the CLARIN-NL framework in which five data resources were curated. For all data sets concerned, written informed consent from the participants or their caretakers has been obtained. All materials were anonymized. The audio files were converted into wav (linear PCM) files and the transcriptions into CHAT or ELAN format. Research data that consisted of test, SPSS and Excel files were documented and converted into CSV files. All data sets obtained appropriate CMDI metadata files. A new CMDI metadata profile for this type of data resources was established and care was taken that ISOcat metadata categories were used to optimize interoperability. After curation all data are deposited at the Max Planck Institute for Psycholinguistics Nijmegen where persistent identifiers are linked to all resources. The content of the transcriptions in CHAT and plain text format can be searched with the TROVA search engine
  • Klein, W. (2000). Changing concepts of the nature-nurture debate. In R. Hide, J. Mittelstrass, & W. Singer (Eds.), Changing concepts of nature at the turn of the millenium: Proceedings plenary session of the Pontifical academy of sciences, 26-29 October 1998 (pp. 289-299). Vatican City: Pontificia Academia Scientiarum.
  • Lansner, A., Sandberg, A., Petersson, K. M., & Ingvar, M. (2000). On forgetful attractor network memories. In H. Malmgren, M. Borga, & L. Niklasson (Eds.), Artificial neural networks in medicine and biology: Proceedings of the ANNIMAB-1 Conference, Göteborg, Sweden, 13-16 May 2000 (pp. 54-62). Heidelberg: Springer Verlag.

    Abstract

    A recurrently connected attractor neural network with a Hebbian learning rule is currently our best ANN analogy for a piece cortex. Functionally biological memory operates on a spectrum of time scales with regard to induction and retention, and it is modulated in complex ways by sub-cortical neuromodulatory systems. Moreover, biological memory networks are commonly believed to be highly distributed and engage many co-operating cortical areas. Here we focus on the temporal aspects of induction and retention of memory in a connectionist type attractor memory model of a piece of cortex. A continuous time, forgetful Bayesian-Hebbian learning rule is described and compared to the characteristics of LTP and LTD seen experimentally. More generally, an attractor network implementing this learning rule can operate as a long-term, intermediate-term, or short-term memory. Modulation of the print-now signal of the learning rule replicates some experimental memory phenomena, like e.g. the von Restorff effect.
  • Laparle, S. (2023). Moving past the lexical affiliate with a frame-based analysis of gesture meaning. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527218.

    Abstract

    Interpreting the meaning of co-speech gesture often involves
    identifying a gesture’s ‘lexical affiliate’, the word or phrase to
    which it most closely relates (Schegloff 1984). Though there is
    work within gesture studies that resists this simplex mapping of
    meaning from speech to gesture (e.g. de Ruiter 2000; Kendon
    2014; Parrill 2008), including an evolving body of literature on
    recurrent gesture and gesture families (e.g. Fricke et al. 2014; Müller 2017), it is still the lexical affiliate model that is most ap-
    parent in formal linguistic models of multimodal meaning(e.g.
    Alahverdzhieva et al. 2017; Lascarides and Stone 2009; Puste-
    jovsky and Krishnaswamy 2021; Schlenker 2020). In this work,
    I argue that the lexical affiliate should be carefully reconsidered
    in the further development of such models.
    In place of the lexical affiliate, I suggest a further shift
    toward a frame-based, action schematic approach to gestural
    meaning in line with that proposed in, for example, Parrill and
    Sweetser (2004) and Müller (2017). To demonstrate the utility
    of this approach I present three types of compositional gesture
    sequences which I call spatial contrast, spatial embedding, and
    cooperative abstract deixis. All three rely on gestural context,
    rather than gesture-speech alignment, to convey interactive (i.e.
    pragmatic) meaning. The centrality of gestural context to ges-
    ture meaning in these examples demonstrates the necessity of
    developing a model of gestural meaning independent of its in-
    tegration with speech.
  • Latrouite, A., & Van Valin Jr., R. D. (2014). Event existentials in Tagalog: A Role and Reference Grammar account. In W. Arka, & N. L. K. Mas Indrawati (Eds.), Argument realisations and related constructions in Austronesian languages: papers from 12-ICAL (pp. 161-174). Canberra: Pacific Linguistics.
  • Lee, R., Chambers, C. G., Huettig, F., & Ganea, P. A. (2017). Children’s semantic and world knowledge overrides fictional information during anticipatory linguistic processing. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Meeting of the Cognitive Science Society (CogSci 2017) (pp. 730-735). Austin, TX: Cognitive Science Society.

    Abstract

    Using real-time eye-movement measures, we asked how a fantastical discourse context competes with stored representations of semantic and world knowledge to influence children's and adults' moment-by-moment interpretation of a story. Seven-year- olds were less effective at bypassing stored semantic and world knowledge during real-time interpretation than adults. Nevertheless, an effect of discourse context on comprehension was still apparent.
  • Lenkiewicz, P., Drude, S., Lenkiewicz, A., Gebre, B. G., Masneri, S., Schreer, O., Schwenninger, J., & Bardeli, R. (2014). Application of audio and video processing methods for language research and documentation: The AVATecH Project. In Z. Vetulani, & J. Mariani (Eds.), 5th Language and Technology Conference, LTC 2011, Poznań, Poland, November 25-27, 2011, Revised Selected Papers (pp. 288-299). Berlin: Springer.

    Abstract

    Evolution and changes of all modern languages is a wellknown fact. However, recently it is reaching dynamics never seen before, which results in loss of the vast amount of information encoded in every language. In order to preserve such rich heritage, and to carry out linguistic research, properly annotated recordings of world languages are necessary. Since creating those annotations is a very laborious task, reaching times 100 longer than the length of the annotated media, innovative video processing algorithms are needed, in order to improve the efficiency and quality of annotation process. This is the scope of the AVATecH project presented in this article
  • Lenkiewicz, P., Shkaravska, O., Goosen, T., Windhouwer, M., Broeder, D., Roth, S., & Olsson, O. (2014). The DWAN framework: Application of a web annotation framework for the general humanities to the domain of language resources. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 3644-3649).
  • Lev-Ari, S., & Peperkamp, S. (2014). Do people converge to the linguistic patterns of non-reliable speakers? Perceptual learning from non-native speakers. In S. Fuchs, M. Grice, A. Hermes, L. Lancia, & D. Mücke (Eds.), Proceedings of the 10th International Seminar on Speech Production (ISSP) (pp. 261-264).

    Abstract

    People's language is shaped by the input from the environment. The environment, however, offers a range of linguistic inputs that differ in their reliability. We test whether listeners accordingly weigh input from sources that differ in reliability differently. Using a perceptual learning paradigm, we show that listeners adjust their representations according to linguistic input provided by native but not by non-native speakers. This is despite the fact that listeners are able to learn the characteristics of the speech of both speakers. These results provide evidence for a disassociation between adaptation to the characteristic of specific speakers and adjustment of linguistic representations in general based on these learned characteristics. This study also has implications for theories of language change. In particular, it cast doubts on the hypothesis that a large proportion of non-native speakers in a community can bring about linguistic changes
  • Levelt, W. J. M. (1991). Lexical access in speech production: Stages versus cascading. In H. Peters, W. Hulstijn, & C. Starkweather (Eds.), Speech motor control and stuttering (pp. 3-10). Amsterdam: Excerpta Medica.
  • Levelt, W. J. M. (1994). On the skill of speaking: How do we access words? In Proceedings ICSLP 94 (pp. 2253-2258). Yokohama: The Acoustical Society of Japan.
  • Levelt, W. J. M. (1994). Onder woorden brengen: Beschouwingen over het spreekproces. In Haarlemse voordrachten: voordrachten gehouden in de Hollandsche Maatschappij der Wetenschappen te Haarlem. Haarlem: Hollandsche maatschappij der wetenschappen.
  • Levelt, W. J. M. (1994). What can a theory of normal speaking contribute to AAC? In ISAAC '94 Conference Book and Proceedings. Hoensbroek: IRV.
  • Levinson, S. C. (2000). Language as nature and language as art. In J. Mittelstrass, & W. Singer (Eds.), Proceedings of the Symposium on ‘Changing concepts of nature and the turn of the Millennium (pp. 257-287). Vatican City: Pontificae Academiae Scientiarium Scripta Varia.
  • Levinson, S. C. (2000). H.P. Grice on location on Rossel Island. In S. S. Chang, L. Liaw, & J. Ruppenhofer (Eds.), Proceedings of the 25th Annual Meeting of the Berkeley Linguistic Society (pp. 210-224). Berkeley: Berkeley Linguistic Society.
  • Levinson, S. C. (1979). Pragmatics and social deixis: Reclaiming the notion of conventional implicature. In C. Chiarello (Ed.), Proceedings of the Fifth Annual Meeting of the Berkeley Linguistics Society (pp. 206-223).
  • Levshina, N. (2023). Testing communicative and learning biases in a causal model of language evolution:A study of cues to Subject and Object. In M. Degano, T. Roberts, G. Sbardolini, & M. Schouwstra (Eds.), The Proceedings of the 23rd Amsterdam Colloquium (pp. 383-387). Amsterdam: University of Amsterdam.
  • Lew, A. A., Hall-Lew, L., & Fairs, A. (2014). Language and Tourism in Sabah, Malaysia and Edinburgh, Scotland. In B. O'Rourke, N. Bermingham, & S. Brennan (Eds.), Opening New Lines of Communication in Applied Linguistics: Proceedings of the 46th Annual Meeting of the British Association for Applied Linguistics (pp. 253-259). London, UK: Scitsiugnil Press.
  • Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators. In CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

    Abstract

    Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI's ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as 'open source', many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human labour is involved), and careful scientific documentation is exceedingly rare. Degrees of openness are relevant to fairness and accountability at all points, from data collection and curation to model architecture, and from training and fine-tuning to release and deployment.
  • Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems. In Proceedings of the 24rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDial 2023). doi:10.18653/v1/2023.sigdial-1.45.

    Abstract

    Speech recognition systems are a key intermediary in voice-driven human-computer interaction. Although speech recognition works well for pristine monologic audio, real-life use cases in open-ended interactive settings still present many challenges. We argue that timing is mission-critical for dialogue systems, and evaluate 5 major commercial ASR systems for their conversational and multilingual support. We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1). This impacts especially the recognition of conversational words (study 2), and in turn has dire consequences for downstream intent recognition (study 3). Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies.
  • Little, H., & Silvey, C. (2014). Interpreting emerging structures: The interdependence of combinatoriality and compositionality. In Proceedings of the First Conference of the International Association for Cognitive Semiotics (IACS 2014) (pp. 113-114).
  • Little, H., Perlman, M., & Eryilmaz, K. (2017). Repeated interactions can lead to more iconic signals. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 760-765). Austin, TX: Cognitive Science Society.

    Abstract

    Previous research has shown that repeated interactions can cause iconicity in signals to reduce. However, data from several recent studies has shown the opposite trend: an increase in iconicity as the result of repeated interactions. Here, we discuss whether signals may become less or more iconic as a result of the modality used to produce them. We review several recent experimental results before presenting new data from multi-modal signals, where visual input creates audio feedback. Our results show that the growth in iconicity present in the audio information may come at a cost to iconicity in the visual information. Our results have implications for how we think about and measure iconicity in artificial signalling experiments. Further, we discuss how iconicity in real world speech may stem from auditory, kinetic or visual information, but iconicity in these different modalities may conflict.
  • Little, H., & Eryilmaz, K. (2014). The effect of physical articulation constraints on the emergence of combinatorial structure. In B. De Boer, & T. Verhoef (Eds.), Proceedings of Evolang X, Workshop on Signals, Speech, and Signs (pp. 11-17).
  • Little, H., & De Boer, B. (2014). The effect of size of articulation space on the emergence of combinatorial structure. In E. Cartmill A., S. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th international conference (EvoLangX) (pp. 479-481). Singapore: World Scientific.
  • Liu, Z., Chen, A., & Van de Velde, H. (2014). Prosodic focus marking in Bai. In N. Campbell, D. Gibbon, & D. Hirst (Eds.), Proceedings of Speech Prosody 2014 (pp. 628-631).

    Abstract

    This study investigates prosodic marking of focus in Bai, a Sino-Tibetan language spoken in the Southwest of China, by adopting a semi-spontaneous experimental approach. Our data show that Bai speakers increase the duration of the focused constituent and reduce the duration of the post-focus constituent to encode focus. However, duration is not used in Bai to distinguish focus types differing in size and contrastivity. Further, pitch plays no role in signaling focus and differentiating focus types. The results thus suggest that Bai uses prosody to mark focus, but to a lesser extent, compared to Mandarin Chinese, with which Bai has been in close contact for centuries, and Cantonese, to which Bai is similar in the tonal system, although Bai is similar to Cantonese in its reliance on duration in prosodic focus marking.
  • Majid, A., Van Staden, M., & Enfield, N. J. (2004). The human body in cognition, brain, and typology. In K. Hovie (Ed.), Forum Handbook, 4th International Forum on Language, Brain, and Cognition - Cognition, Brain, and Typology: Toward a Synthesis (pp. 31-35). Sendai: Tohoku University.

    Abstract

    The human body is unique: it is both an object of perception and the source of human experience. Its universality makes it a perfect resource for asking questions about how cognition, brain and typology relate to one another. For example, we can ask how speakers of different languages segment and categorize the human body. A dominant view is that body parts are “given” by visual perceptual discontinuities, and that words are merely labels for these visually determined parts (e.g., Andersen, 1978; Brown, 1976; Lakoff, 1987). However, there are problems with this view. First it ignores other perceptual information, such as somatosensory and motoric representations. By looking at the neural representations of sesnsory representations, we can test how much of the categorization of the human body can be done through perception alone. Second, we can look at language typology to see how much universality and variation there is in body-part categories. A comparison of a range of typologically, genetically and areally diverse languages shows that the perceptual view has only limited applicability (Majid, Enfield & van Staden, in press). For example, using a “coloring-in” task, where speakers of seven different languages were given a line drawing of a human body and asked to color in various body parts, Majid & van Staden (in prep) show that languages vary substantially in body part segmentation. For example, Jahai (Mon-Khmer) makes a lexical distinction between upper arm, lower arm, and hand, but Lavukaleve (Papuan Isolate) has just one word to refer to arm, hand, and leg. This shows that body part categorization is not a straightforward mapping of words to visually determined perceptual parts.
  • Majid, A., Van Staden, M., Boster, J. S., & Bowerman, M. (2004). Event categorization: A cross-linguistic perspective. In K. Forbus, D. Gentner, & T. Tegier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 885-890). Mahwah, NJ: Erlbaum.

    Abstract

    Many studies in cognitive science address how people categorize objects, but there has been comparatively little research on event categorization. This study investigated the categorization of events involving material destruction, such as “cutting” and “breaking”. Speakers of 28 typologically, genetically, and areally diverse languages described events shown in a set of video-clips. There was considerable cross-linguistic agreement in the dimensions along which the events were distinguished, but there was variation in the number of categories and the placement of their boundaries.
  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2017). Whether long-term tracking of speech rate affects perception depends on who is talking. In Proceedings of Interspeech 2017 (pp. 586-590). doi:10.21437/Interspeech.2017-1517.

    Abstract

    Speech rate is known to modulate perception of temporally ambiguous speech sounds. For instance, a vowel may be perceived as short when the immediate speech context is slow, but as long when the context is fast. Yet, effects of long-term tracking of speech rate are largely unexplored. Two experiments tested whether long-term tracking of rate influences perception of the temporal Dutch vowel contrast /ɑ/-/a:/. In Experiment 1, one low-rate group listened to 'neutral' rate speech from talker A and to slow speech from talker B. Another high-rate group was exposed to the same neutral speech from A, but to fast speech from B. Between-group comparison of the 'neutral' trials revealed that the low-rate group reported a higher proportion of /a:/ in A's 'neutral' speech, indicating that A sounded faster when B was slow. Experiment 2 tested whether one's own speech rate also contributes to effects of long-term tracking of rate. Here, talker B's speech was replaced by playback of participants' own fast or slow speech. No evidence was found that one's own voice affected perception of talker A in larger speech contexts. These results carry implications for our understanding of the mechanisms involved in rate-dependent speech perception and of dialogue.
  • Matic, D., & Nikolaeva, I. (2014). Focus feature percolation: Evidence from Tundra Nenets and Tundra Yukaghir. In S. Müller (Ed.), Proceedings of the 21st International Conference on Head-Driven Phrase Structure Grammar (HPSG 2014) (pp. 299-317). Stanford, CA: CSLI Publications.

    Abstract

    Two Siberian languages, Tundra Nenets and Tundra Yukaghir, do not obey strong island constraints in questioning: any sub-constituent of a relative or adverbial clause can be questioned. We argue that this has to do with how focusing works in these languages. The focused sub-constituent remains in situ, but there is abundant morphosyntactic evidence that the focus feature is passed up to the head of the clause. The result is the formation of a complex focus structure in which both the head and non head daughter are overtly marked as focus, and they are interpreted as a pairwise list such that the focus background is applicable to this list, but not to other alternative lists
  • Matsuo, A. (2004). Young children's understanding of ongoing vs. completion in present and perfective participles. In J. v. Kampen, & S. Baauw (Eds.), Proceedings of GALA 2003 (pp. 305-316). Utrecht: Netherlands Graduate School of Linguistics (LOT).
  • McQueen, J. M., & Cutler, A. (1998). Spotting (different kinds of) words in (different kinds of) context. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2791-2794). Sydney: ICSLP.

    Abstract

    The results of a word-spotting experiment are presented in which Dutch listeners tried to spot different types of bisyllabic Dutch words embedded in different types of nonsense contexts. Embedded verbs were not reliably harder to spot than embedded nouns; this suggests that nouns and verbs are recognised via the same basic processes. Iambic words were no harder to spot than trochaic words, suggesting that trochaic words are not in principle easier to recognise than iambic words. Words were harder to spot in consonantal contexts (i.e., contexts which themselves could not be words) than in longer contexts which contained at least one vowel (i.e., contexts which, though not words, were possible words of Dutch). A control experiment showed that this difference was not due to acoustic differences between the words in each context. The results support the claim that spoken-word recognition is sensitive to the viability of sound sequences as possible words.
  • McQueen, J. M., Cutler, A., & Norris, D. (2000). Positive and negative influences of the lexicon on phonemic decision-making. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the Sixth International Conference on Spoken Language Processing: Vol. 3 (pp. 778-781). Beijing: China Military Friendship Publish.

    Abstract

    Lexical knowledge influences how human listeners make decisions about speech sounds. Positive lexical effects (faster responses to target sounds in words than in nonwords) are robust across several laboratory tasks, while negative effects (slower responses to targets in more word-like nonwords than in less word-like nonwords) have been found in phonetic decision tasks but not phoneme monitoring tasks. The present experiments tested whether negative lexical effects are therefore a task-specific consequence of the forced choice required in phonetic decision. We compared phoneme monitoring and phonetic decision performance using the same Dutch materials in each task. In both experiments there were positive lexical effects, but no negative lexical effects. We observe that in all studies showing negative lexical effects, the materials were made by cross-splicing, which meant that they contained perceptual evidence supporting the lexically-consistent phonemes. Lexical knowledge seems to influence phonemic decision-making only when there is evidence for the lexically-consistent phoneme in the speech signal.
  • McQueen, J. M., Cutler, A., & Norris, D. (2000). Why Merge really is autonomous and parsimonious. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP (Workshop on Spoken Word Access Processes) (pp. 47-50). Nijmegen: Max-Planck-Institute for Psycholinguistics.

    Abstract

    We briefly describe the Merge model of phonemic decision-making, and, in the light of general arguments about the possible role of feedback in spoken-word recognition, defend Merge's feedforward structure. Merge not only accounts adequately for the data, without invoking feedback connections, but does so in a parsimonious manner.
  • Micklos, A. (2014). The nature of language in interaction. In E. Cartmill, S. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference.
  • Mizera, P., Pollak, P., Kolman, A., & Ernestus, M. (2014). Impact of irregular pronunciation on phonetic segmentation of Nijmegen corpus of Casual Czech. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings (pp. 499-506). Heidelberg: Springer.

    Abstract

    This paper describes the pilot study of phonetic segmentation applied to Nijmegen Corpus of Casual Czech (NCCCz). This corpus contains informal speech of strong spontaneous nature which influences the character of produced speech at various levels. This work is the part of wider research related to the analysis of pronunciation reduction in such informal speech. We present the analysis of the accuracy of phonetic segmentation when canonical or reduced pronunciation is used. The achieved accuracy of realized phonetic segmentation provides information about general accuracy of proper acoustic modelling which is supposed to be applied in spontaneous speech recognition. As a byproduct of presented spontaneous speech segmentation, this paper also describes the created lexicon with canonical pronunciations of words in NCCCz, a tool supporting pronunciation check of lexicon items, and finally also a minidatabase of selected utterances from NCCCz manually labelled on phonetic level suitable for evaluation purposes
  • Monaghan, P., Brand, J., Frost, R. L. A., & Taylor, G. (2017). Multiple variable cues in the environment promote accurate and robust word learning. In G. Gunzelman, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 817-822). Retrieved from https://mindmodeling.org/cogsci2017/papers/0164/index.html.

    Abstract

    Learning how words refer to aspects of the environment is a complex task, but one that is supported by numerous cues within the environment which constrain the possibilities for matching words to their intended referents. In this paper we tested the predictions of a computational model of multiple cue integration for word learning, that predicted variation in the presence of cues provides an optimal learning situation. In a cross-situational learning task with adult participants, we varied the reliability of presence of distributional, prosodic, and gestural cues. We found that the best learning occurred when cues were often present, but not always. The effect of variability increased the salience of individual cues for the learner, but resulted in robust learning that was not vulnerable to individual cues’ presence or absence. Thus, variability of multiple cues in the language-learning environment provided the optimal circumstances for word learning.
  • Nabrotzky, J., Ambrazaitis, G., Zellers, M., & House, D. (2023). Temporal alignment of manual gestures’ phase transitions with lexical and post-lexical accentual F0 peaks in spontaneous Swedish interaction. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527194.

    Abstract

    Many studies investigating the temporal alignment of co-speech
    gestures to acoustic units in the speech signal find a close
    coupling of the gestural landmarks and pitch accents or the
    stressed syllable of pitch-accented words. In English, a pitch
    accent is anchored in the lexically stressed syllable. Hence, it is
    unclear whether it is the lexical phonological dimension of
    stress, or the phrase-level prominence that determines the
    details of speech-gesture synchronization. This paper explores
    the relation between gestural phase transitions and accentual F0
    peaks in Stockholm Swedish, which exhibits a lexical pitch
    accent distinction. When produced with phrase-level
    prominence, there are three different configurations of
    lexicality of F0 peaks and the status of the syllable it is aligned
    with. Through analyzing the alignment of the different F0 peaks
    with gestural onsets in spontaneous dyadic conversations, we
    aim to contribute to our understanding of the role of lexical
    prosodic phonology in the co-production of speech and gesture.
    The results, though limited by a small dataset, still suggest
    differences between the three types of peaks concerning which
    types of gesture phase onsets they tend to align with, and how
    well these landmarks align with each other, although these
    differences did not reach significance.
  • Norris, D., McQueen, J. M., & Cutler, A. (1994). Competition and segmentation in spoken word recognition. In Proceedings of the Third International Conference on Spoken Language Processing: Vol. 1 (pp. 401-404). Yokohama: PACIFICO.

    Abstract

    This paper describes recent experimental evidence which shows that models of spoken word recognition must incorporate both inhibition between competing lexical candidates and a sensitivity to metrical cues to lexical segmentation. A new version of the Shortlist [1][2] model incorporating the Metrical Segmentation Strategy [3] provides a detailed simulation of the data.
  • Norris, D., Cutler, A., McQueen, J. M., Butterfield, S., & Kearns, R. K. (2000). Language-universal constraints on the segmentation of English. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP (Workshop on Spoken Word Access Processes) (pp. 43-46). Nijmegen: Max-Planck-Institute for Psycholinguistics.

    Abstract

    Two word-spotting experiments are reported that examine whether the Possible-Word Constraint (PWC) [1] is a language-specific or language-universal strategy for the segmentation of continuous speech. The PWC disfavours parses which leave an impossible residue between the end of a candidate word and a known boundary. The experiments examined cases where the residue was either a CV syllable with a lax vowel, or a CVC syllable with a schwa. Although neither syllable context is a possible word in English, word-spotting in both contexts was easier than with a context consisting of a single consonant. The PWC appears to be language-universal rather than language-specific.
  • Norris, D., Cutler, A., & McQueen, J. M. (2000). The optimal architecture for simulating spoken-word recognition. In C. Davis, T. Van Gelder, & R. Wales (Eds.), Cognitive Science in Australia, 2000: Proceedings of the Fifth Biennial Conference of the Australasian Cognitive Science Society. Adelaide: Causal Productions.

    Abstract

    Simulations explored the inability of the TRACE model of spoken-word recognition to model the effects on human listening of subcategorical mismatch in word forms. The source of TRACE's failure lay not in interactive connectivity, not in the presence of inter-word competition, and not in the use of phonemic representations, but in the need for continuously optimised interpretation of the input. When an analogue of TRACE was allowed to cycle to asymptote on every slice of input, an acceptable simulation of the subcategorical mismatch data was achieved. Even then, however, the simulation was not as close as that produced by the Merge model, which has inter-word competition, phonemic representations and continuous optimisation (but no interactive connectivity).
  • Offrede, T., Mishra, C., Skantze, G., Fuchs, S., & Mooshammer, C. (2023). Do Humans Converge Phonetically When Talking to a Robot? In R. Skarnitzl, & J. Volin (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (pp. 3507-3511). Prague: GUARANT International.

    Abstract

    Phonetic convergence—i.e., adapting one’s speech
    towards that of an interlocutor—has been shown
    to occur in human-human conversations as well as
    human-machine interactions. Here, we investigate
    the hypothesis that human-to-robot convergence is
    influenced by the human’s perception of the robot
    and by the conversation’s topic. We conducted a
    within-subjects experiment in which 33 participants
    interacted with two robots differing in their eye gaze
    behavior—one looked constantly at the participant;
    the other produced gaze aversions, similarly to a
    human’s behavior. Additionally, the robot asked
    questions with increasing intimacy levels.
    We observed that the speakers tended to converge
    on F0 to the robots. However, this convergence
    to the robots was not modulated by how the
    speakers perceived them or by the topic’s intimacy.
    Interestingly, speakers produced lower F0 means
    when talking about more intimate topics. We
    discuss these findings in terms of current theories of
    conversational convergence.
  • Ortega, G., Schiefner, A., & Ozyurek, A. (2017). Speakers’ gestures predict the meaning and perception of iconicity in signs. In G. Gunzelmann, A. Howe, & T. Tenbrink (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 889-894). Austin, TX: Cognitive Science Society.

    Abstract

    Sign languages stand out in that there is high prevalence of
    conventionalised linguistic forms that map directly to their
    referent (i.e., iconic). Hearing adults show low performance
    when asked to guess the meaning of iconic signs suggesting
    that their iconic features are largely inaccessible to them.
    However, it has not been investigated whether speakers’
    gestures, which also share the property of iconicity, may
    assist non-signers in guessing the meaning of signs. Results
    from a pantomime generation task (Study 1) show that
    speakers’ gestures exhibit a high degree of systematicity, and
    share different degrees of form overlap with signs (full,
    partial, and no overlap). Study 2 shows that signs with full
    and partial overlap are more accurately guessed and are
    assigned higher iconicity ratings than signs with no overlap.
    Deaf and hearing adults converge in their iconic depictions
    for some concepts due to the shared conceptual knowledge
    and manual-visual modality.
  • Ortega, G., Sumer, B., & Ozyurek, A. (2014). Type of iconicity matters: Bias for action-based signs in sign language acquisition. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1114-1119). Austin, Tx: Cognitive Science Society.

    Abstract

    Early studies investigating sign language acquisition claimed
    that signs whose structures are motivated by the form of their
    referent (iconic) are not favoured in language development.
    However, recent work has shown that the first signs in deaf
    children’s lexicon are iconic. In this paper we go a step
    further and ask whether different types of iconicity modulate
    learning sign-referent links. Results from a picture description
    task indicate that children and adults used signs with two
    possible variants differentially. While children signing to
    adults favoured variants that map onto actions associated with
    a referent (action signs), adults signing to another adult
    produced variants that map onto objects’ perceptual features
    (perceptual signs). Parents interacting with children used
    more action variants than signers in adult-adult interactions.
    These results are in line with claims that language
    development is tightly linked to motor experience and that
    iconicity can be a communicative strategy in parental input.
  • Otake, T., & Cutler, A. (2000). A set of Japanese word cohorts rated for relative familiarity. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the Sixth International Conference on Spoken Language Processing: Vol. 3 (pp. 766-769). Beijing: China Military Friendship Publish.

    Abstract

    A database is presented of relative familiarity ratings for 24 sets of Japanese words, each set comprising words overlapping in the initial portions. These ratings are useful for the generation of material sets for research in the recognition of spoken words.
  • Ozyurek, A. (1998). An analysis of the basic meaning of Turkish demonstratives in face-to-face conversational interaction. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (Eds.), Oralite et gestualite: Communication multimodale, interaction: actes du colloque ORAGE 98 (pp. 609-614). Paris: L'Harmattan.
  • Ozyurek, A. (1994). How children talk about a conversation. In K. Beals, J. Denton, R. Knippen, L. Melnar, H. Suzuki, & E. Zeinfeld (Eds.), Papers from the Thirtieth Regional Meeting of the Chicago Linguistic Society: Main Session (pp. 309-319). Chicago, Ill: Chicago Linguistic Society.
  • Ozyurek, A. (1994). How children talk about conversations: Development of roles and voices. In E. V. Clark (Ed.), Proceedings of the Twenty-Sixth Annual Child Language Research Forum (pp. 197-206). Stanford: CSLI Publications.
  • Ozyurek, A., & Ozcaliskan, S. (2000). How do children learn to conflate manner and path in their speech and gestures? Differences in English and Turkish. In E. V. Clark (Ed.), The proceedings of the Thirtieth Child Language Research Forum (pp. 77-85). Stanford: CSLI Publications.
  • Peeters, D., Azar, Z., & Ozyurek, A. (2014). The interplay between joint attention, physical proximity, and pointing gesture in demonstrative choice. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1144-1149). Austin, Tx: Cognitive Science Society.
  • Perlman, M., Clark, N., & Tanner, J. (2014). Iconicity and ape gesture. In E. A. Cartmill, S. G. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 236-243). New Jersey: World Scientific.

    Abstract

    Iconic gestures are hypothesized to be c rucial to the evolution of language. Yet the important question of whether apes produce iconic gestures is the subject of considerable debate. This paper presents the current state of research on iconicity in ape gesture. In particular, it describes some of the empirical evidence suggesting that apes produce three different kinds of iconic gestures; it compares the iconicity hypothesis to other major hypotheses of ape gesture; and finally, it offers some directions for future ape gesture research
  • Perlman, M., Fusaroli, R., Fein, D., & Naigles, L. (2017). The use of iconic words in early child-parent interactions. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 913-918). Austin, TX: Cognitive Science Society.

    Abstract

    This paper examines the use of iconic words in early conversations between children and caregivers. The longitudinal data include a span of six observations of 35 children-parent dyads in the same semi-structured activity. Our findings show that children’s speech initially has a high proportion of iconic words, and over time, these words become diluted by an increase of arbitrary words. Parents’ speech is also initially high in iconic words, with a decrease in the proportion of iconic words over time – in this case driven by the use of fewer iconic words. The level and development of iconicity are related to individual differences in the children’s cognitive skills. Our findings fit with the hypothesis that iconicity facilitates early word learning and may play an important role in learning to produce new words.
  • Popov, V., Ostarek, M., & Tenison, C. (2017). Inferential Pitfalls in Decoding Neural Representations. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 961-966). Austin, TX: Cognitive Science Society.

    Abstract

    A key challenge for cognitive neuroscience is to decipher the representational schemes of the brain. A recent class of decoding algorithms for fMRI data, stimulus-feature-based encoding models, is becoming increasingly popular for inferring the dimensions of neural representational spaces from stimulus-feature spaces. We argue that such inferences are not always valid, because decoding can occur even if the neural representational space and the stimulus-feature space use different representational schemes. This can happen when there is a systematic mapping between them. In a simulation, we successfully decoded the binary representation of numbers from their decimal features. Since binary and decimal number systems use different representations, we cannot conclude that the binary representation encodes decimal features. The same argument applies to the decoding of neural patterns from stimulus-feature spaces and we urge caution in inferring the nature of the neural code from such methods. We discuss ways to overcome these inferential limitations.
  • Pouw, W., Aslanidou, A., Kamermans, K. L., & Paas, F. (2017). Is ambiguity detection in haptic imagery possible? Evidence for Enactive imaginings. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 2925-2930). Austin, TX: Cognitive Science Society.

    Abstract

    A classic discussion about visual imagery is whether it affords reinterpretation, like discovering two interpretations in the duck/rabbit illustration. Recent findings converge on reinterpretation being possible in visual imagery, suggesting functional equivalence with pictorial representations. However, it is unclear whether such reinterpretations are necessarily a visual-pictorial achievement. To assess this, 68 participants were briefly presented 2-d ambiguous figures. One figure was presented visually, the other via manual touch alone. Afterwards participants mentally rotated the memorized figures as to discover a novel interpretation. A portion (20.6%) of the participants detected a novel interpretation in visual imagery, replicating previous research. Strikingly, 23.6% of participants were able to reinterpret figures they had only felt. That reinterpretation truly involved haptic processes was further supported, as some participants performed co-thought gestures on an imagined figure during retrieval. These results are promising for further development of an Enactivist approach to imagination.
  • Ravignani, A., Bowling, D., & Kirby, S. (2014). The psychology of biological clocks: A new framework for the evolution of rhythm. In E. A. Cartmill, S. G. Roberts, & H. Lyn (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 262-269). Singapore: World Scientific.
  • Roberts, S. G., Dediu, D., & Levinson, S. C. (2014). Detecting differences between the languages of Neandertals and modern humans. In E. A. Cartmill, S. G. Roberts, H. Lyn, & H. Cornish (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 501-502). Singapore: World Scientific.

    Abstract

    Dediu and Levinson (2013) argue that Neandertals had essentially modern language and speech, and that they were in genetic contact with the ancestors of modern humans during our dispersal out of Africa. This raises the possibility of cultural and linguistic contact between the two human lineages. If such contact did occur, then it might have influenced the cultural evolution of the languages. Since the genetic traces of contact with Neandertals are limited to the populations outside of Africa, Dediu & Levinson predict that there may be structural differences between the present-day languages derived from languages in contact with Neanderthals, and those derived from languages that were not influenced by such contact. Since the signature of such deep contact might reside in patterns of features, they suggested that machine learning methods may be able to detect these differences. This paper attempts to test this hypothesis and to estimate particular linguistic features that are potential candidates for carrying a signature of Neandertal languages.
  • Roberts, S. G., & De Vos, C. (2014). Gene-culture coevolution of a linguistic system in two modalities. In B. De Boer, & T. Verhoef (Eds.), Proceedings of Evolang X, Workshop on Signals, Speech, and Signs (pp. 23-27).

    Abstract

    Complex communication can take place in a range of modalities such as auditory, visual, and tactile modalities. In a very general way, the modality that individuals use is constrained by their biological biases (humans cannot use magnetic fields directly to communicate to each other). The majority of natural languages have a large audible component. However, since humans can learn sign languages just as easily, it’s not clear to what extent the prevalence of spoken languages is due to biological biases, the social environment or cultural inheritance. This paper suggests that we can explore the relative contribution of these factors by modelling the spontaneous emergence of sign languages that are shared by the deaf and hearing members of relatively isolated communities. Such shared signing communities have arisen in enclaves around the world and may provide useful insights by demonstrating how languages evolve as the deaf proportion of its members has strong biases towards the visual language modality. In this paper we describe a model of cultural evolution in two modalities, combining aspects that are thought to impact the emergence of sign languages in a more general evolutionary framework. The model can be used to explore hypotheses about how sign languages emerge.
  • Roberts, S. G., Thompson, B., & Smith, K. (2014). Social interaction influences the evolution of cognitive biases for language. In E. A. Cartmill, S. G. Roberts, & H. Lyn (Eds.), The Evolution of Language: Proceedings of the 10th International Conference (pp. 278-285). Singapore: World Scientific. doi:0.1142/9789814603638_0036.

    Abstract

    Models of cultural evolution demonstrate that the link between individual biases and population- level phenomena can be obscured by the process of cultural transmission (Kirby, Dowman, & Griffiths, 2007). However, recent extensions to these models predict that linguistic diversity will not emerge and that learners should evolve to expect little linguistic variation in their input (Smith & Thompson, 2012). We demonstrate that this result derives from assumptions that privilege certain kinds of social interaction by exploring a range of alternative social models. We find several evolutionary routes to linguistic diversity, and show that social interaction not only influences the kinds of biases which could evolve to support language, but also the effects those biases have on a linguistic system. Given the same starting situation, the evolution of biases for language learning and the distribution of linguistic variation are affected by the kinds of social interaction that a population privileges.
  • De Ruiter, J. P. (2004). On the primacy of language in multimodal communication. In Workshop Proceedings on Multimodal Corpora: Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces.(LREC2004) (pp. 38-41). Paris: ELRA - European Language Resources Association (CD-ROM).

    Abstract

    In this paper, I will argue that although the study of multimodal interaction offers exciting new prospects for Human Computer Interaction and human-human communication research, language is the primary form of communication, even in multimodal systems. I will support this claim with theoretical and empirical arguments, mainly drawn from human-human communication research, and will discuss the implications for multimodal communication research and Human-Computer Interaction.
  • Sander, J., Lieberman, A., & Rowland, C. F. (2023). Exploring joint attention in American Sign Language: The influence of sign familiarity. In M. Goldwater, F. K. Anggoro, B. K. Hayes, & D. C. Ong (Eds.), Proceedings of the 45th Annual Meeting of the Cognitive Science Society (CogSci 2023) (pp. 632-638).

    Abstract

    Children’s ability to share attention with another social partner (i.e., joint attention) has been found to support language development. Despite the large amount of research examining the effects of joint attention on language in hearing population, little is known about how deaf children learning sign languages achieve joint attention with their caregivers during natural social interaction and how caregivers provide and scaffold learning opportunities for their children. The present study investigates the properties and timing of joint attention surrounding familiar and novel naming events and their relationship to children’s vocabulary. Naturalistic play sessions of caretaker-child-dyads using American Sign Language were analyzed in regards to naming events of either familiar or novel object labeling events and the surrounding joint attention events. We observed that most naming events took place in the context of a successful joint attention event and that sign familiarity was related to the timing of naming events within the joint attention events. Our results suggest that caregivers are highly sensitive to their child’s visual attention in interactions and modulate joint attention differently in the context of naming events of familiar vs. novel object labels.
  • Sauter, D., Scott, S., & Calder, A. (2004). Categorisation of vocally expressed positive emotion: A first step towards basic positive emotions? [Abstract]. Proceedings of the British Psychological Society, 12, 111.

    Abstract

    Most of the study of basic emotion expressions has focused on facial expressions and little work has been done to specifically investigate happiness, the only positive of the basic emotions (Ekman & Friesen, 1971). However, a theoretical suggestion has been made that happiness could be broken down into discrete positive emotions, which each fulfil the criteria of basic emotions, and that these would be expressed vocally (Ekman, 1992). To empirically test this hypothesis, 20 participants categorised 80 paralinguistic sounds using the labels achievement, amusement, contentment, pleasure and relief. The results suggest that achievement, amusement and relief are perceived as distinct categories, which subjects accurately identify. In contrast, the categories of contentment and pleasure were systematically confused with other responses, although performance was still well above chance levels. These findings are initial evidence that the positive emotions engage distinct vocal expressions and may be considered to be distinct emotion categories.
  • Scharenborg, O., Bouwman, G., & Boves, L. (2000). Connected digit recognition with class specific word models. In Proceedings of the COST249 Workshop on Voice Operated Telecom Services workshop (pp. 71-74).

    Abstract

    This work focuses on efficient use of the training material by selecting the optimal set of model topologies. We do this by training multiple word models of each word class, based on a subclassification according to a priori knowledge of the training material. We will examine classification criteria with respect to duration of the word, gender of the speaker, position of the word in the utterance, pauses in the vicinity of the word, and combinations of these. Comparative experiments were carried out on a corpus consisting of Dutch spoken connected digit strings and isolated digits, which are recorded in a wide variety of acoustic conditions. The results show, that classification based on gender of the speaker, position of the digit in the string, pauses in the vicinity of the training tokens, and models based on a combination of these criteria perform significantly better than the set with single models per digit.
  • Scharenborg, O., Boves, L., & Ten Bosch, L. (2004). ‘On-line early recognition’ of polysyllabic words in continuous speech. In S. Cassidy, F. Cox, R. Mannell, & P. Sallyanne (Eds.), Proceedings of the Tenth Australian International Conference on Speech Science & Technology (pp. 387-392). Canberra: Australian Speech Science and Technology Association Inc.

    Abstract

    In this paper, we investigate the ability of SpeM, our recognition system based on the combination of an automatic phone recogniser and a wordsearch module, to determine as early as possible during the word recognition process whether a word is likely to be recognised correctly (this we refer to as ‘on-line’ early word recognition). We present two measures that can be used to predict whether a word is correctly recognised: the Bayesian word activation and the amount of available (acoustic) information for a word. SpeM was tested on 1,463 polysyllabic words in 885 continuous speech utterances. The investigated predictors indicated that a word activation that is 1) high (but not too high) and 2) based on more phones is more reliable to predict the correctness of a word than a similarly high value based on a small number of phones or a lower value of the word activation.
  • Schmidt, J., Janse, E., & Scharenborg, O. (2014). Age, hearing loss and the perception of affective utterances in conversational speech. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 1929-1933).

    Abstract

    This study investigates whether age and/or hearing loss influence the perception of the emotion dimensions arousal (calm vs. aroused) and valence (positive vs. negative attitude) in conversational speech fragments. Specifically, this study focuses on the relationship between participants' ratings of affective speech and acoustic parameters known to be associated with arousal and valence (mean F0, intensity, and articulation rate). Ten normal-hearing younger and ten older adults with varying hearing loss were tested on two rating tasks. Stimuli consisted of short sentences taken from a corpus of conversational affective speech. In both rating tasks, participants estimated the value of the emotion dimension at hand using a 5-point scale. For arousal, higher intensity was generally associated with higher arousal in both age groups. Compared to younger participants, older participants rated the utterances as less aroused, and showed a smaller effect of intensity on their arousal ratings. For valence, higher mean F0 was associated with more negative ratings in both age groups. Generally, age group differences in rating affective utterances may not relate to age group differences in hearing loss, but rather to other differences between the age groups, as older participants' rating patterns were not associated with their individual hearing loss.
  • Schuller, B., Steidl, S., Batliner, A., Bergelson, E., Krajewski, J., Janott, C., Amatuni, A., Casillas, M., Seidl, A., Soderstrom, M., Warlaumont, A. S., Hidalgo, G., Schnieder, S., Heiser, C., Hohenhorst, W., Herzog, M., Schmitt, M., Qian, K., Zhang, Y., Trigeorgis, G. and 2 moreSchuller, B., Steidl, S., Batliner, A., Bergelson, E., Krajewski, J., Janott, C., Amatuni, A., Casillas, M., Seidl, A., Soderstrom, M., Warlaumont, A. S., Hidalgo, G., Schnieder, S., Heiser, C., Hohenhorst, W., Herzog, M., Schmitt, M., Qian, K., Zhang, Y., Trigeorgis, G., Tzirakis, P., & Zafeiriou, S. (2017). The INTERSPEECH 2017 computational paralinguistics challenge: Addressee, cold & snoring. In Proceedings of Interspeech 2017 (pp. 3442-3446). doi:10.21437/Interspeech.2017-43.

    Abstract

    The INTERSPEECH 2017 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: In the Addressee sub-challenge, it has to be determined whether speech produced by an adult is directed towards another adult or towards a child; in the Cold sub-challenge, speech under cold has to be told apart from ‘healthy’ speech; and in the Snoring subchallenge, four different types of snoring have to be classified. In this paper, we describe these sub-challenges, their conditions, and the baseline feature extraction and classifiers, which include data-learnt feature representations by end-to-end learning with convolutional and recurrent neural networks, and bag-of-audiowords for the first time in the challenge series
  • Scott, S., & Sauter, D. (2004). Vocal expressions of emotion and positive and negative basic emotions [Abstract]. Proceedings of the British Psychological Society, 12, 156.

    Abstract

    Previous studies have indicated that vocal and facial expressions of the ‘basic’ emotions share aspects of processing. Thus amygdala damage compromises the perception of fear and anger from the face and from the voice. In the current study we tested the hypothesis that there exist positive basic emotions, expressed mainly in the voice (Ekman, 1992). Vocal stimuli were produced to express the specific positive emotions of amusement, achievement, pleasure, contentment and relief.
  • Sekine, K. (2017). Gestural hesitation reveals children’s competence on multimodal communication: Emergence of disguised adaptor. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3113-3118). Austin, TX: Cognitive Science Society.

    Abstract

    Speakers sometimes modify their gestures during the process of production into adaptors such as hair touching or eye scratching. Such disguised adaptors are evidence that the speaker can monitor their gestures. In this study, we investigated when and how disguised adaptors are first produced by children. Sixty elementary school children participated in this study (ten children in each age group; from 7 to 12 years old). They were instructed to watch a cartoon and retell it to their parents. The results showed that children did not produce disguised adaptors until the age of 8. The disguised adaptors accompany fluent speech until the children are 10 years old and accompany dysfluent speech until they reach 11 or 12 years of age. These results suggest that children start to monitor their gestures when they are 9 or 10 years old. Cognitive changes were considered as factors to influence emergence of disguised adaptors
  • Sekine, K., & Kajikawa, T. (2023). Does the spatial distribution of a speaker's gaze and gesture impact on a listener's comprehension of discourse? In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527208.

    Abstract

    This study investigated the impact of a speaker's gaze direction
    on a listener's comprehension of discourse. Previous research
    suggests that hand gestures play a role in referent allocation,
    enabling listeners to better understand the discourse. The
    current study aims to determine whether the speaker's gaze
    direction has a similar effect on reference resolution as co-
    speech gestures. Thirty native Japanese speakers participated in
    the study and were assigned to one of three conditions:
    congruent, incongruent, or speech-only. Participants watched
    36 videos of an actor narrating a story consisting of three
    sentences with two protagonists. The speaker consistently
    used hand gestures to allocate one protagonist to the lower right
    and the other to the lower left space, while directing her gaze to
    either space of the target person (congruent), the other person
    (incongruent), or no particular space (speech-only). Participants
    were required to verbally answer a question about the target
    protagonist involved in an accidental event as quickly as
    possible. Results indicate that participants in the congruent
    condition exhibited faster reaction times than those in the
    incongruent condition, although the difference was not
    significant. These findings suggest that the speaker's gaze
    direction is not enough to facilitate a listener's comprehension
    of discourse.
  • Senft, G. (1991). Bakavilisi Biga - we can 'turn' the language - or: What happens to English words in Kilivila language? In W. Bahner, J. Schildt, & D. Viehwegger (Eds.), Proceedings of the XIVth International Congress of Linguists (pp. 1743-1746). Berlin: Akademie Verlag.
  • Senft, G. (2000). COME and GO in Kilivila. In B. Palmer, & P. Geraghty (Eds.), SICOL. Proceedings of the second international conference on Oceanic linguistics: Volume 2, Historical and descriptive studies (pp. 105-136). Canberra: Pacific Linguistics.
  • Seuren, P. A. M. (1991). Notes on noun phrases and quantification. In Proceedings of the International Conference on Current Issues in Computational Linguistics (pp. 19-44). Penang, Malaysia: Universiti Sains Malaysia.
  • Seuren, P. A. M. (1994). The computational lexicon: All lexical content is predicate. In Z. Yusoff (Ed.), Proceedings of the International Conference on Linguistic Applications 26-28 July 1994 (pp. 211-216). Penang: Universiti Sains Malaysia, Unit Terjemahan Melalui Komputer (UTMK).
  • Seuren, P. A. M. (2014). Scope and external datives. In B. Cornillie, C. Hamans, & D. Jaspers (Eds.), Proceedings of a mini-symposium on Pieter Seuren's 80th birthday organised at the 47th Annual Meeting of the Societas Linguistica Europaea.

    Abstract

    In this study it is argued that scope, as a property of scope‐creating operators, is a real and important element in the semantico‐grammatical description of languages. The notion of scope is illustrated and, as far as possible, defined. A first idea is given of the ‘grammar of scope’, which defines the relation between scope in the logically structured semantic analysis (SA) of sentences on the one hand and surface structure on the other. Evidence is adduced showing that peripheral preposition phrases (PPPs) in the surface structure of sentences represent scope‐creating operators in SA, and that external datives fall into this category: they are scope‐creating PPPs. It follows that, in English and Dutch, the internal dative (I gave John a book) and the external dative (I gave a book to John) are not simple syntactic variants expressing the same meaning. Instead, internal datives are an integral part of the argument structure of the matrix predicate, whereas external datives represent scope‐creating operators in SA. In the Romance languages, the (non‐pronominal) external dative has been re‐analysed as an argument type dative, but this has not happened in English and Dutch, which have many verbs that only allow for an external dative (e.g. donate, reveal). When both datives are allowed, there are systematic semantic differences, including scope differences.
  • Seuren, P. A. M. (1991). What makes a text untranslatable? In H. M. N. Noor Ein, & H. S. Atiah (Eds.), Pragmatik Penterjemahan: Prinsip, Amalan dan Penilaian Menuju ke Abad 21 ("The Pragmatics of Translation: Principles, Practice and Evaluation Moving towards the 21st Century") (pp. 19-27). Kuala Lumpur: Dewan Bahasa dan Pustaka.
  • Seuren, P. A. M. (1994). Translation relations in semantic syntax. In G. Bouma, & G. Van Noord (Eds.), CLIN IV: Papers from the Fourth CLIN Meeting (pp. 149-162). Groningen: Vakgroep Alfa-informatica, Rijksuniversiteit Groningen.
  • Severijnen, G. G. A., Bosker, H. R., & McQueen, J. M. (2023). Syllable rate drives rate normalization, but is not the only factor. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th International Congress of the Phonetic Sciences (ICPhS 2023) (pp. 56-60). Prague: Guarant International.

    Abstract

    Speech is perceived relative to the speech rate in the context. It is unclear, however, what information listeners use to compute speech rate. The present study examines whether listeners use the number of
    syllables per unit time (i.e., syllable rate) as a measure of speech rate, as indexed by subsequent vowel perception. We ran two rate-normalization experiments in which participants heard duration-matched word lists that contained either monosyllabic
    vs. bisyllabic words (Experiment 1), or monosyllabic vs. trisyllabic pseudowords (Experiment 2). The participants’ task was to categorize an /ɑ-aː/ continuum that followed the word lists. The monosyllabic condition was perceived as slower (i.e., fewer /aː/ responses) than the bisyllabic and
    trisyllabic condition. However, no difference was observed between bisyllabic and trisyllabic contexts. Therefore, while syllable rate is used in perceiving speech rate, other factors, such as fast speech processes, mean F0, and intensity, must also influence rate normalization.
  • Shatzman, K. B. (2004). Segmenting ambiguous phrases using phoneme duration. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 329-332). Seoul: Sunjijn Printing Co.

    Abstract

    The results of an eye-tracking experiment are presented in which Dutch listeners' eye movements were monitored as they heard sentences and saw four pictured objects. Participants were instructed to click on the object mentioned in the sentence. In the critical sentences, a stop-initial target (e.g., "pot") was preceded by an [s], thus causing ambiguity regarding whether the sentence refers to a stop-initial or a cluster-initial word (e.g., "spot"). Participants made fewer fixations to the target pictures when the stop and the preceding [s] were cross-spliced from the cluster-initial word than when they were spliced from a different token of the sentence containing the stop-initial word. Acoustic analyses showed that the two versions differed in various measures, but only one of these - the duration of the [s] - correlated with the perceptual effect. Thus, in this context, the [s] duration information is an important factor guiding word recognition.
  • Shkaravska, O., Van Eekelen, M., & Tamalet, A. (2014). Collected size semantics for strict functional programs over general polymorphic lists. In U. Dal Lago, & R. Pena (Eds.), Foundational and Practical Aspects of Resource Analysis: Third International Workshop, FOPARA 2013, Bertinoro, Italy, August 29-31, 2013, Revised Selected Papers (pp. 143-159). Berlin: Springer.

    Abstract

    Size analysis can be an important part of heap consumption analysis. This paper is a part of ongoing work about typing support for checking output-on-input size dependencies for function definitions in a strict functional language. A significant restriction for our earlier results is that inner data structures (e.g. in a list of lists) all must have the same size. Here, we make a big step forwards by overcoming this limitation via the introduction of higher-order size annotations such that variate sizes of inner data structures can be expressed. In this way the analysis becomes applicable for general, polymorphic nested lists.
  • Siahaan, P., & Wijaya Rajeg, G. P. (2023). Multimodal language use in Indonesian: Recurrent gestures associated with negation. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527196.

    Abstract

    This paper presents research findings on manual gestures
    associated with negation in Indonesian, utilizing data sourced
    from talk shows available on YouTube. The study reveals that
    Indonesian speakers employ six recurrent negation gestures,
    which have been observed in various languages worldwide.
    This suggests that gestures exhibiting a stable form-meaning
    relationship and recurring frequently in relation to negation are
    prevalent around the globe, although their distribution may
    differ across cultures and languages. Furthermore, the paper
    demonstrates that negation gestures are not strictly tied to
    verbal negation. Overall, the aim of this paper is to contribute
    to a deeper understanding of the conventional usage and cross-
    linguistic distribution of recurrent gestures.
  • Slonimska, A., & Roberts, S. G. (2017). A case for systematic sound symbolism in pragmatics:The role of the first phoneme in question prediction in context. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 1090-1095). Austin, TX: Cognitive Science Society.

    Abstract

    Turn-taking in conversation is a cognitively demanding process that proceeds rapidly due to interlocutors utilizing a range of cues
    to aid prediction. In the present study we set out to test recent claims that content question words (also called wh-words) sound similar within languages as an adaptation to help listeners predict
    that a question is about to be asked. We test whether upcoming questions can be predicted based on the first phoneme of a turn and the prior context. We analyze the Switchboard corpus of English
    by means of a decision tree to test whether /w/ and /h/ are good statistical cues of upcoming questions in conversation. Based on the results, we perform a controlled experiment to test whether
    people really use these cues to recognize questions. In both studies
    we show that both the initial phoneme and the sequential context help predict questions. This contributes converging evidence that elements of languages adapt to pragmatic pressures applied during
    conversation.
  • De Smedt, K., Hinrichs, E., Meurers, D., Skadiņa, I., Sanford Pedersen, B., Navarretta, C., Bel, N., Lindén, K., Lopatková, M., Hajič, J., Andersen, G., & Lenkiewicz, P. (2014). CLARA: A new generation of researchers in common language resources and their applications. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation (pp. 2166-2174).
  • Smith, A. C., Monaghan, P., & Huettig, F. (2014). Examining strains and symptoms of the ‘Literacy Virus’: The effects of orthographic transparency on phonological processing in a connectionist model of reading. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014). Austin, TX: Cognitive Science Society.

    Abstract

    The effect of literacy on phonological processing has been described in terms of a virus that “infects all speech processing” (Frith, 1998). Empirical data has established that literacy leads to changes to the way in which phonological information is processed. Harm & Seidenberg (1999) demonstrated that a connectionist network trained to map between English orthographic and phonological representations display’s more componential phonological processing than a network trained only to stably represent the phonological forms of words. Within this study we use a similar model yet manipulate the transparency of orthographic-to-phonological mappings. We observe that networks trained on a transparent orthography are better at restoring phonetic features and phonemes. However, networks trained on non-transparent orthographies are more likely to restore corrupted phonological segments with legal, coarser linguistic units (e.g. onset, coda). Our study therefore provides an explicit description of how differences in orthographic transparency can lead to varying strains and symptoms of the ‘literacy virus’.
  • Smith, A. C., Monaghan, P., & Huettig, F. (2014). A comprehensive model of spoken word recognition must be multimodal: Evidence from studies of language-mediated visual attention. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014). Austin, TX: Cognitive Science Society.

    Abstract

    When processing language, the cognitive system has access to information from a range of modalities (e.g. auditory, visual) to support language processing. Language mediated visual attention studies have shown sensitivity of the listener to phonological, visual, and semantic similarity when processing a word. In a computational model of language mediated visual attention, that models spoken word processing as the parallel integration of information from phonological, semantic and visual processing streams, we simulate such effects of competition within modalities. Our simulations raised untested predictions about stronger and earlier effects of visual and semantic similarity compared to phonological similarity around the rhyme of the word. Two visual world studies confirmed these predictions. The model and behavioral studies suggest that, during spoken word comprehension, multimodal information can be recruited rapidly to constrain lexical selection to the extent that phonological rhyme information may exert little influence on this process.
  • Stanojevic, M., & Alhama, R. G. (2017). Neural discontinuous constituency parsing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 1666-1676). Association for Computational Linguistics.

    Abstract

    One of the most pressing issues in dis-
    continuous constituency transition-based
    parsing is that the relevant information for
    parsing decisions could be located in any
    part of the stack or the buffer. In this pa-
    per, we propose a solution to this prob-
    lem by replacing the structured percep-
    tron model with a recursive neural model
    that computes a global representation of
    the configuration, therefore allowing even
    the most remote parts of the configura-
    tion to influence the parsing decisions. We
    also provide a detailed analysis of how
    this representation should be built out of
    sub-representations of its core elements
    (words, trees and stack). Additionally, we
    investigate how different types of swap or-
    acles influence the results. Our model is
    the first neural discontinuous constituency
    parser, and it outperforms all the previ-
    ously published models on three out of
    four datasets while on the fourth it obtains
    second place by a tiny difference.

    Additional information

    http://aclweb.org/anthology/D17-1174
  • Stern, G. (2023). On embodied use of recognitional demonstratives. In W. Pouw, J. Trujillo, H. R. Bosker, L. Drijvers, M. Hoetjes, J. Holler, S. Kadava, L. Van Maastricht, E. Mamus, & A. Ozyurek (Eds.), Gesture and Speech in Interaction (GeSpIn) Conference. doi:10.17617/2.3527204.

    Abstract

    This study focuses on embodied uses of recognitional
    demonstratives. While multimodal conversation analytic
    studies have shown how gesture and speech interact in the
    elaboration of exophoric references, little attention has been
    given to the multimodal configuration of other types of
    referential actions. Based on a video-recorded corpus of
    professional meetings held in French, this qualitative study
    shows that a subtype of deictic references, namely recognitional
    references, are frequently associated with iconic gestures, thus
    challenging the traditional distinction between exophoric and
    endophoric uses of deixis.
  • Sumer, B., Grabitz, C., & Küntay, A. (2017). Early produced signs are iconic: Evidence from Turkish Sign Language. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3273-3278). Austin, TX: Cognitive Science Society.

    Abstract

    Motivated form-meaning mappings are pervasive in sign languages, and iconicity has recently been shown to facilitate sign learning from early on. This study investigated the role of iconicity for language acquisition in Turkish Sign Language (TID). Participants were 43 signing children (aged 10 to 45 months) of deaf parents. Sign production ability was recorded using the adapted version of MacArthur Bates Communicative Developmental Inventory (CDI) consisting of 500 items for TID. Iconicity and familiarity ratings for a subset of 104 signs were available. Our results revealed that the iconicity of a sign was positively correlated with the percentage of children producing a sign and that iconicity significantly predicted the percentage of children producing a sign, independent of familiarity or phonological complexity. Our results are consistent with previous findings on sign language acquisition and provide further support for the facilitating effect of iconic form-meaning mappings in sign learning.
  • Sumer, B., Perniss, P., Zwitserlood, I., & Ozyurek, A. (2014). Learning to express "left-right" & "front-behind" in a sign versus spoken language. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1550-1555). Austin, Tx: Cognitive Science Society.

    Abstract

    Developmental studies show that it takes longer for
    children learning spoken languages to acquire viewpointdependent
    spatial relations (e.g., left-right, front-behind),
    compared to ones that are not viewpoint-dependent (e.g.,
    in, on, under). The current study investigates how
    children learn to express viewpoint-dependent relations
    in a sign language where depicted spatial relations can be
    communicated in an analogue manner in the space in
    front of the body or by using body-anchored signs (e.g.,
    tapping the right and left hand/arm to mean left and
    right). Our results indicate that the visual-spatial
    modality might have a facilitating effect on learning to
    express these spatial relations (especially in encoding of
    left-right) in a sign language (i.e., Turkish Sign
    Language) compared to a spoken language (i.e.,
    Turkish).
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Turn-taking in social talk dialogues: Temporal, formal and functional aspects. In 9th International Conference Speech and Computer (SPECOM'2004) (pp. 454-461).

    Abstract

    This paper presents a quantitative analysis of the
    turn-taking mechanism evidenced in 93 telephone
    dialogues that were taken from the 9-million-word
    Spoken Dutch Corpus. While the first part of the paper
    focuses on the temporal phenomena of turn taking, such
    as durations of pauses and overlaps of turns in the
    dialogues, the second part explores the discoursefunctional
    aspects of utterances in a subset of 8
    dialogues that were annotated especially for this
    purpose. The results show that speakers adapt their turntaking
    behaviour to the interlocutor’s behaviour.
    Furthermore, the results indicate that male-male dialogs
    show a higher proportion of overlapping turns than
    female-female dialogues.
  • Ten Bosch, L., Ernestus, M., & Boves, L. (2014). Comparing reaction time sequences from human participants and computational models. In Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association (pp. 462-466).

    Abstract

    This paper addresses the question how to compare reaction times computed by a computational model of speech comprehension with observed reaction times by participants. The question is based on the observation that reaction time sequences substantially differ per participant, which raises the issue of how exactly the model is to be assessed. Part of the variation in reaction time sequences is caused by the so-called local speed: the current reaction time correlates to some extent with a number of previous reaction times, due to slowly varying variations in attention, fatigue etc. This paper proposes a method, based on time series analysis, to filter the observed reaction times in order to separate the local speed effects. Results show that after such filtering the between-participant correlations increase as well as the average correlation between participant and model increases. The presented technique provides insights into relevant aspects that are to be taken into account when comparing reaction time sequences
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Durational aspects of turn-taking in spontaneous face-to-face and telephone dialogues. In P. Sojka, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: Proceedings of the 7th International Conference TSD 2004 (pp. 563-570). Heidelberg: Springer.

    Abstract

    On the basis of two-speaker spontaneous conversations, it is shown that the distributions of both pauses and speech-overlaps of telephone and faceto-face dialogues have different statistical properties. Pauses in a face-to-face
    dialogue last up to 4 times longer than pauses in telephone conversations in functionally comparable conditions. There is a high correlation (0.88 or larger) between the average pause duration for the two speakers across face-to-face
    dialogues and telephone dialogues. The data provided form a first quantitative analysis of the complex turn-taking mechanism evidenced in the dialogues available in the 9-million-word Spoken Dutch Corpus.
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2017). The recognition of compounds: A computational account. In Proceedings of Interspeech 2017 (pp. 1158-1162). doi:10.21437/Interspeech.2017-1048.

    Abstract

    This paper investigates the processes in comprehending spoken noun-noun compounds, using data from the BALDEY database. BALDEY contains lexicality judgments and reaction times (RTs) for Dutch stimuli for which also linguistic information is included. Two different approaches are combined. The first is based on regression by Dynamic Survival Analysis, which models decisions and RTs as a consequence of the fact that a cumulative density function exceeds some threshold. The parameters of that function are estimated from the observed RT data. The second approach is based on DIANA, a process-oriented computational model of human word comprehension, which simulates the comprehension process with the acoustic stimulus as input. DIANA gives the identity and the number of the word candidates that are activated at each 10 ms time step.

    Both approaches show how the processes involved in comprehending compounds change during a stimulus. Survival Analysis shows that the impact of word duration varies during the course of a stimulus. The density of word and non-word hypotheses in DIANA shows a corresponding pattern with different regimes. We show how the approaches complement each other, and discuss additional ways in which data and process models can be combined.
  • Torreira, F., Roberts, S. G., & Hammarström, H. (2014). Functional trade-off between lexical tone and intonation: Typological evidence from polar-question marking. In C. Gussenhoven, Y. Chen, & D. Dediu (Eds.), Proceedings of the 4th International Symposium on Tonal Aspects of Language (pp. 100-103).

    Abstract

    Tone languages are often reported to make use of utterancelevel intonation as well as of lexical tone. We test the alternative hypotheses that a) the coexistence of lexical tone and utterance-level intonation in tone languages results in a diminished functional load for intonation, and b) that lexical tone and intonation can coexist in tone languages without undermining each other’s functional load in a substantial way. In order to do this, we collected data from two large typological databases, and performed mixed-effects and phylogenetic regression analyses controlling for genealogical and areal factors to estimate the probability of a language exhibiting grammatical devices for encoding polar questions given its status as a tonal or an intonation-only language. Our analyses indicate that, while both tone and intonational languages tend to develop grammatical devices for marking polar questions above chance level, tone languages do this at a significantly higher frequency, with estimated probabilities ranging between 0.88 and .98. This statistical bias provides cross-linguistic empirical support to the view that the use of tonal features to mark lexical contrasts leads to a diminished functional load for utterance-level intonation.

Share this page