Publications

Displaying 101 - 170 of 170
  • Little, H., Perlman, M., & Eryilmaz, K. (2017). Repeated interactions can lead to more iconic signals. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 760-765). Austin, TX: Cognitive Science Society.

    Abstract

    Previous research has shown that repeated interactions can cause iconicity in signals to reduce. However, data from several recent studies has shown the opposite trend: an increase in iconicity as the result of repeated interactions. Here, we discuss whether signals may become less or more iconic as a result of the modality used to produce them. We review several recent experimental results before presenting new data from multi-modal signals, where visual input creates audio feedback. Our results show that the growth in iconicity present in the audio information may come at a cost to iconicity in the visual information. Our results have implications for how we think about and measure iconicity in artificial signalling experiments. Further, we discuss how iconicity in real world speech may stem from auditory, kinetic or visual information, but iconicity in these different modalities may conflict.
  • Little, H. (Ed.). (2017). Special Issue on the Emergence of Sound Systems [Special Issue]. The Journal of Language Evolution, 2(1).
  • Lucas, C., Griffiths, T., Xu, F., & Fawcett, C. (2008). A rational model of preference learning and choice prediction by children. In D. Koller, Y. Bengio, D. Schuurmans, L. Bottou, & A. Culotta (Eds.), Advances in Neural Information Processing Systems.

    Abstract

    Young children demonstrate the ability to make inferences about the preferences of other agents based on their choices. However, there exists no overarching account of what children are doing when they learn about preferences or how they use that knowledge. We use a rational model of preference learning, drawing on ideas from economics and computer science, to explain the behavior of children in several recent experiments. Specifically, we show how a simple econometric model can be extended to capture two- to four-year-olds’ use of statistical information in inferring preferences, and their generalization of these preferences.
  • Magyari, L., & De Ruiter, J. P. (2008). Timing in conversation: The anticipation of turn endings. In J. Ginzburg, P. Healey, & Y. Sato (Eds.), Proceedings of the 12th Workshop on the Semantics and Pragmatics Dialogue (pp. 139-146). London: King's college.

    Abstract

    We examined how communicators can switch between speaker and listener role with such accurate timing. During conversations, the majority of role transitions happens with a gap or overlap of only a few hundred milliseconds. This suggests that listeners can predict when the turn of the current speaker is going to end. Our hypothesis is that listeners know when a turn ends because they know how it ends. Anticipating the last words of a turn can help the next speaker in predicting when the turn will end, and also in anticipating the content of the turn, so that an appropriate response can be prepared in advance. We used the stimuli material of an earlier experiment (De Ruiter, Mitterer & Enfield, 2006), in which subjects were listening to turns from natural conversations and had to press a button exactly when the turn they were listening to ended. In the present experiment, we investigated if the subjects can complete those turns when only an initial fragment of the turn is presented to them. We found that the subjects made better predictions about the last words of those turns that had more accurate responses in the earlier button press experiment.
  • Mamus, E., Speed, L. J., Ozyurek, A., & Majid, A. (2021). Sensory modality of input influences encoding of motion events in speech but not co-speech gestures. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 376-382). Vienna: Cognitive Science Society.

    Abstract

    Visual and auditory channels have different affordances and
    this is mirrored in what information is available for linguistic
    encoding. The visual channel has high spatial acuity, whereas
    the auditory channel has better temporal acuity. These
    differences may lead to different conceptualizations of events
    and affect multimodal language production. Previous studies of
    motion events typically present visual input to elicit speech and
    gesture. The present study compared events presented as audio-
    only, visual-only, or multimodal (visual+audio) input and
    assessed speech and co-speech gesture for path and manner of
    motion in Turkish. Speakers with audio-only input mentioned
    path more and manner less in verbal descriptions, compared to
    speakers who had visual input. There was no difference in the
    type or frequency of gestures across conditions, and gestures
    were dominated by path-only gestures. This suggests that input
    modality influences speakers’ encoding of path and manner of
    motion events in speech, but not in co-speech gestures.
  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2017). Whether long-term tracking of speech rate affects perception depends on who is talking. In Proceedings of Interspeech 2017 (pp. 586-590). doi:10.21437/Interspeech.2017-1517.

    Abstract

    Speech rate is known to modulate perception of temporally ambiguous speech sounds. For instance, a vowel may be perceived as short when the immediate speech context is slow, but as long when the context is fast. Yet, effects of long-term tracking of speech rate are largely unexplored. Two experiments tested whether long-term tracking of rate influences perception of the temporal Dutch vowel contrast /ɑ/-/a:/. In Experiment 1, one low-rate group listened to 'neutral' rate speech from talker A and to slow speech from talker B. Another high-rate group was exposed to the same neutral speech from A, but to fast speech from B. Between-group comparison of the 'neutral' trials revealed that the low-rate group reported a higher proportion of /a:/ in A's 'neutral' speech, indicating that A sounded faster when B was slow. Experiment 2 tested whether one's own speech rate also contributes to effects of long-term tracking of rate. Here, talker B's speech was replaced by playback of participants' own fast or slow speech. No evidence was found that one's own voice affected perception of talker A in larger speech contexts. These results carry implications for our understanding of the mechanisms involved in rate-dependent speech perception and of dialogue.
  • McCafferty, S. G., & Gullberg, M. (Eds.). (2008). Gesture and SLA: Toward an integrated approach [Special Issue]. Studies in Second Language Acquisition, 30(2).
  • McQueen, J. M., & Cutler, A. (1998). Spotting (different kinds of) words in (different kinds of) context. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2791-2794). Sydney: ICSLP.

    Abstract

    The results of a word-spotting experiment are presented in which Dutch listeners tried to spot different types of bisyllabic Dutch words embedded in different types of nonsense contexts. Embedded verbs were not reliably harder to spot than embedded nouns; this suggests that nouns and verbs are recognised via the same basic processes. Iambic words were no harder to spot than trochaic words, suggesting that trochaic words are not in principle easier to recognise than iambic words. Words were harder to spot in consonantal contexts (i.e., contexts which themselves could not be words) than in longer contexts which contained at least one vowel (i.e., contexts which, though not words, were possible words of Dutch). A control experiment showed that this difference was not due to acoustic differences between the words in each context. The results support the claim that spoken-word recognition is sensitive to the viability of sound sequences as possible words.
  • Merkx, D., & Frank, S. L. (2021). Human sentence processing: Recurrence or attention? In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2021) (pp. 12-22). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL). doi:10.18653/v1/2021.cmcl-1.2.

    Abstract

    Recurrent neural networks (RNNs) have long been an architecture of interest for computational models of human sentence processing. The recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks but little is known about its ability to model human language processing. We compare Transformer- and RNN-based language models’ ability to account for measures of human reading effort. Our analysis shows Transformers to outperform RNNs in explaining self-paced reading times and neural activity during reading English sentences, challenging the widely held idea that human sentence processing involves recurrent and immediate processing and provides evidence for cue-based retrieval.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2021). Semantic sentence similarity: Size does not always matter. In Proceedings of Interspeech 2021 (pp. 4393-4397). doi:10.21437/Interspeech.2021-1464.

    Abstract

    This study addresses the question whether visually grounded speech recognition (VGS) models learn to capture sentence semantics without access to any prior linguistic knowledge. We produce synthetic and natural spoken versions of a well known semantic textual similarity database and show that our VGS model produces embeddings that correlate well with human semantic similarity judgements. Our results show that a model trained on a small image-caption database outperforms two models trained on much larger databases, indicating that database size is not all that matters. We also investigate the importance of having multiple captions per image and find that this is indeed helpful even if the total number of images is lower, suggesting that paraphrasing is a valuable learning signal. While the general trend in the field is to create ever larger datasets to train models on, our findings indicate other characteristics of the database can just as important.
  • Mitterer, H. (2008). How are words reduced in spontaneous speech? In A. Botonis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (pp. 165-168). Athens: University of Athens.

    Abstract

    Words are reduced in spontaneous speech. If reductions are constrained by functional (i.e., perception and production) constraints, they should not be arbitrary. This hypothesis was tested by examing the pronunciations of high- to mid-frequency words in a Dutch and a German spontaneous speech corpus. In logistic-regression models the "reduction likelihood" of a phoneme was predicted by fixed-effect predictors such as position within the word, word length, word frequency, and stress, as well as random effects such as phoneme identity and word. The models for Dutch and German show many communalities. This is in line with the assumption that similar functional constraints influence reductions in both languages.
  • Monaghan, P., Brand, J., Frost, R. L. A., & Taylor, G. (2017). Multiple variable cues in the environment promote accurate and robust word learning. In G. Gunzelman, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 817-822). Retrieved from https://mindmodeling.org/cogsci2017/papers/0164/index.html.

    Abstract

    Learning how words refer to aspects of the environment is a complex task, but one that is supported by numerous cues within the environment which constrain the possibilities for matching words to their intended referents. In this paper we tested the predictions of a computational model of multiple cue integration for word learning, that predicted variation in the presence of cues provides an optimal learning situation. In a cross-situational learning task with adult participants, we varied the reliability of presence of distributional, prosodic, and gestural cues. We found that the best learning occurred when cues were often present, but not always. The effect of variability increased the salience of individual cues for the learner, but resulted in robust learning that was not vulnerable to individual cues’ presence or absence. Thus, variability of multiple cues in the language-learning environment provided the optimal circumstances for word learning.
  • Mudd, K., Lutzenberger, H., De Vos, C., & De Boer, B. (2021). Social structure and lexical uniformity: A case study of gender differences in the Kata Kolok community. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 2692-2698). Vienna: Cognitive Science Society.

    Abstract

    Language emergence is characterized by a high degree of lex-
    ical variation. It has been suggested that the speed at which
    lexical conventionalization occurs depends partially on social
    structure. In large communities, individuals receive input from
    many sources, creating a pressure for lexical convergence.
    In small, insular communities, individuals can remember id-
    iolects and share common ground with interlocuters, allow-
    ing these communities to retain a high degree of lexical vari-
    ation. We look at lexical variation in Kata Kolok, a sign lan-
    guage which emerged six generations ago in a Balinese vil-
    lage, where women tend to have more tightly-knit social net-
    works than men. We test if there are differing degrees of lexical
    uniformity between women and men by reanalyzing a picture
    description task in Kata Kolok. We find that women’s produc-
    tions exhibit less lexical uniformity than men’s. One possible
    explanation of this finding is that women’s more tightly-knit
    social networks allow for remembering idiolects, alleviating
    the pressure for lexical convergence, but social network data
    from the Kata Kolok community is needed to support this ex-
    planation.
  • Ortega, G., Schiefner, A., & Ozyurek, A. (2017). Speakers’ gestures predict the meaning and perception of iconicity in signs. In G. Gunzelmann, A. Howe, & T. Tenbrink (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 889-894). Austin, TX: Cognitive Science Society.

    Abstract

    Sign languages stand out in that there is high prevalence of
    conventionalised linguistic forms that map directly to their
    referent (i.e., iconic). Hearing adults show low performance
    when asked to guess the meaning of iconic signs suggesting
    that their iconic features are largely inaccessible to them.
    However, it has not been investigated whether speakers’
    gestures, which also share the property of iconicity, may
    assist non-signers in guessing the meaning of signs. Results
    from a pantomime generation task (Study 1) show that
    speakers’ gestures exhibit a high degree of systematicity, and
    share different degrees of form overlap with signs (full,
    partial, and no overlap). Study 2 shows that signs with full
    and partial overlap are more accurately guessed and are
    assigned higher iconicity ratings than signs with no overlap.
    Deaf and hearing adults converge in their iconic depictions
    for some concepts due to the shared conceptual knowledge
    and manual-visual modality.
  • Ozturk, O., & Papafragou, A. (2008). Acquisition of evidentiality and source monitoring. In H. Chan, H. Jacob, & E. Kapia (Eds.), Proceedings from the 32nd Annual Boston University Conference on Language Development [BUCLD 32] (pp. 368-377). Somerville, Mass.: Cascadilla Press.
  • Ozyurek, A. (1998). An analysis of the basic meaning of Turkish demonstratives in face-to-face conversational interaction. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (Eds.), Oralite et gestualite: Communication multimodale, interaction: actes du colloque ORAGE 98 (pp. 609-614). Paris: L'Harmattan.
  • Perlman, M., Fusaroli, R., Fein, D., & Naigles, L. (2017). The use of iconic words in early child-parent interactions. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 913-918). Austin, TX: Cognitive Science Society.

    Abstract

    This paper examines the use of iconic words in early conversations between children and caregivers. The longitudinal data include a span of six observations of 35 children-parent dyads in the same semi-structured activity. Our findings show that children’s speech initially has a high proportion of iconic words, and over time, these words become diluted by an increase of arbitrary words. Parents’ speech is also initially high in iconic words, with a decrease in the proportion of iconic words over time – in this case driven by the use of fewer iconic words. The level and development of iconicity are related to individual differences in the children’s cognitive skills. Our findings fit with the hypothesis that iconicity facilitates early word learning and may play an important role in learning to produce new words.
  • Petersson, K. M. (2008). On cognition, structured sequence processing, and adaptive dynamical systems. American Institute of Physics Conference Proceedings, 1060(1), 195-200.

    Abstract

    Cognitive neuroscience approaches the brain as a cognitive system: a system that functionally is conceptualized in terms of information processing. We outline some aspects of this concept and consider a physical system to be an information processing device when a subclass of its physical states can be viewed as representational/cognitive and transitions between these can be conceptualized as a process operating on these states by implementing operations on the corresponding representational structures. We identify a generic and fundamental problem in cognition: sequentially organized structured processing. Structured sequence processing provides the brain, in an essential sense, with its processing logic. In an approach addressing this problem, we illustrate how to integrate levels of analysis within a framework of adaptive dynamical systems. We note that the dynamical system framework lends itself to a description of asynchronous event-driven devices, which is likely to be important in cognition because the brain appears to be an asynchronous processing system. We use the human language faculty and natural language processing as a concrete example through out.
  • Popov, V., Ostarek, M., & Tenison, C. (2017). Inferential Pitfalls in Decoding Neural Representations. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 961-966). Austin, TX: Cognitive Science Society.

    Abstract

    A key challenge for cognitive neuroscience is to decipher the representational schemes of the brain. A recent class of decoding algorithms for fMRI data, stimulus-feature-based encoding models, is becoming increasingly popular for inferring the dimensions of neural representational spaces from stimulus-feature spaces. We argue that such inferences are not always valid, because decoding can occur even if the neural representational space and the stimulus-feature space use different representational schemes. This can happen when there is a systematic mapping between them. In a simulation, we successfully decoded the binary representation of numbers from their decimal features. Since binary and decimal number systems use different representations, we cannot conclude that the binary representation encodes decimal features. The same argument applies to the decoding of neural patterns from stimulus-feature spaces and we urge caution in inferring the nature of the neural code from such methods. We discuss ways to overcome these inferential limitations.
  • Pouw, W., Wit, J., Bögels, S., Rasenberg, M., Milivojevic, B., & Ozyurek, A. (2021). Semantically related gestures move alike: Towards a distributional semantics of gesture kinematics. In V. G. Duffy (Ed.), Digital human modeling and applications in health, safety, ergonomics and risk management. human body, motion and behavior:12th International Conference, DHM 2021, Held as Part of the 23rd HCI International Conference, HCII 2021 (pp. 269-287). Berlin: Springer. doi:10.1007/978-3-030-77817-0_20.
  • Pouw, W., Aslanidou, A., Kamermans, K. L., & Paas, F. (2017). Is ambiguity detection in haptic imagery possible? Evidence for Enactive imaginings. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 2925-2930). Austin, TX: Cognitive Science Society.

    Abstract

    A classic discussion about visual imagery is whether it affords reinterpretation, like discovering two interpretations in the duck/rabbit illustration. Recent findings converge on reinterpretation being possible in visual imagery, suggesting functional equivalence with pictorial representations. However, it is unclear whether such reinterpretations are necessarily a visual-pictorial achievement. To assess this, 68 participants were briefly presented 2-d ambiguous figures. One figure was presented visually, the other via manual touch alone. Afterwards participants mentally rotated the memorized figures as to discover a novel interpretation. A portion (20.6%) of the participants detected a novel interpretation in visual imagery, replicating previous research. Strikingly, 23.6% of participants were able to reinterpret figures they had only felt. That reinterpretation truly involved haptic processes was further supported, as some participants performed co-thought gestures on an imagined figure during retrieval. These results are promising for further development of an Enactivist approach to imagination.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). The strength of stress-related lexical competition depends on the presence of first-syllable stress. In Proceedings of Interspeech 2008 (pp. 1954-1954).

    Abstract

    Dutch listeners' looks to printed words were tracked while they listened to instructions to click with their mouse on one of them. When presented with targets from word pairs where the first two syllables were segmentally identical but differed in stress location, listeners used stress information to recognize the target before segmental information disambiguated the words. Furthermore, the amount of lexical competition was influenced by the presence or absence of word-initial stress.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). Lexical stress information modulates the time-course of spoken-word recognition. In Proceedings of Acoustics' 08 (pp. 3183-3188).

    Abstract

    Segmental as well as suprasegmental information is used by Dutch listeners to recognize words. The time-course of the effect of suprasegmental stress information on spoken-word recognition was investigated in a previous study, in which we tracked Dutch listeners' looks to arrays of four printed words as they listened to spoken sentences. Each target was displayed along with a competitor that did not differ segmentally in its first two syllables but differed in stress placement (e.g., 'CENtimeter' and 'sentiMENT'). The listeners' eye-movements showed that stress information is used to recognize the target before distinct segmental information is available. Here, we examine the role of durational information in this effect. Two experiments showed that initial-syllable duration, as a cue to lexical stress, is not interpreted dependent on the speaking rate of the preceding carrier sentence. This still held when other stress cues like pitch and amplitude were removed. Rather, the speaking rate of the preceding carrier affected the speed of word recognition globally, even though the rate of the target itself was not altered. Stress information modulated lexical competition, but did so independently of the rate of the preceding carrier, even if duration was the only stress cue present.
  • Robotham, L., Trinkler, I., & Sauter, D. (2008). The power of positives: Evidence for an overall emotional recognition deficit in Huntington's disease [Abstract]. Journal of Neurology, Neurosurgery & Psychiatry, 79, A12.

    Abstract

    The recognition of emotions of disgust, anger and fear have been shown to be significantly impaired in Huntington’s disease (eg,Sprengelmeyer et al, 1997, 2006; Gray et al, 1997; Milders et al, 2003,Montagne et al, 2006; Johnson et al, 2007; De Gelder et al, 2008). The relative impairment of these emotions might have implied a recognition impairment specific to negative emotions. Could the asymmetric recognition deficits be due not to the complexity of the emotion but rather reflect the complexity of the task? In the current study, 15 Huntington’s patients and 16 control subjects were presented with negative and positive non-speech emotional vocalisations that were to be identified as anger, fear, sadness, disgust, achievement, pleasure and amusement in a forced-choice paradigm. This experiment more accurately matched the negative emotions with positive emotions in a homogeneous modality. The resulting dually impaired ability of Huntington’s patients to identify negative and positive non-speech emotional vocalisations correctly provides evidence for an overall emotional recognition deficit in the disease. These results indicate that previous findings of a specificity in emotional recognition deficits might instead be due to the limitations of the visual modality. Previous experiments may have found an effect of emotional specificy due to the presence of a single positive emotion, happiness, in the midst of multiple negative emotions. In contrast with the previous literature, the study presented here points to a global deficit in the recognition of emotional sounds.
  • De Ruiter, L. E. (2008). How useful are polynomials for analyzing intonation? In Proceedings of Interspeech 2008 (pp. 785-789).

    Abstract

    This paper presents the first application of polynomial modeling as a means for validating phonological pitch accent labels to German data. It is compared to traditional phonetic analysis (measuring minima, maxima, alignment). The traditional method fares better in classification, but results are comparable in statistical accent pair testing. Robustness tests show that pitch correction is necessary in both cases. The approaches are discussed in terms of their practicability, applicability to other domains of research and interpretability of their results.
  • Sauter, D., Eisner, F., Rosen, S., & Scott, S. K. (2008). The role of source and filter cues in emotion recognition in speech [Abstract]. Journal of the Acoustical Society of America, 123, 3739-3740.

    Abstract

    In the context of the source-filter theory of speech, it is well established that intelligibility is heavily reliant on information carried by the filter, that is, spectral cues (e.g., Faulkner et al., 2001; Shannon et al., 1995). However, the extraction of other types of information in the speech signal, such as emotion and identity, is less well understood. In this study we investigated the extent to which emotion recognition in speech depends on filterdependent cues, using a forced-choice emotion identification task at ten levels of noise-vocoding ranging between one and 32 channels. In addition, participants performed a speech intelligibility task with the same stimuli. Our results indicate that compared to speech intelligibility, emotion recognition relies less on spectral information and more on cues typically signaled by source variations, such as voice pitch, voice quality, and intensity. We suggest that, while the reliance on spectral dynamics is likely a unique aspect of human speech, greater phylogenetic continuity across species may be found in the communication of affect in vocalizations.
  • Sauter, D. (2008). The time-course of emotional voice processing [Abstract]. Neurocase, 14, 455-455.

    Abstract

    Research using event-related brain potentials (ERPs) has demonstrated an early differential effect in fronto-central regions when processing emotional, as compared to affectively neutral facial stimuli (e.g., Eimer & Holmes, 2002). In this talk, data demonstrating a similar effect in the auditory domain will be presented. ERPs were recorded in a one-back task where participants had to identify immediate repetitions of emotion category, such as a fearful sound followed by another fearful sound. The stimulus set consisted of non-verbal emotional vocalisations communicating positive and negative sounds, as well as neutral baseline conditions. Similarly to the facial domain, fear sounds as compared to acoustically controlled neutral sounds, elicited a frontally distributed positivity with an onset latency of about 150 ms after stimulus onset. These data suggest the existence of a rapid multi-modal frontocentral mechanism discriminating emotional from non-emotional human signals.
  • Scharenborg, O., & Cooke, M. P. (2008). Comparing human and machine recognition performance on a VCV corpus. In ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery".

    Abstract

    Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and without ‘pitch’ information, using the same backend. The results are compared with listener results at the level of articulatory feature classification. While no acoustic feature representation reached the levels of human performance, both MFCCs and Rate maps achieved good scores, with Rate maps nearing human performance on the classification of voicing. Comparing the results on the most difficult articulatory features to classify showed similarities between the humans and the SVMs: e.g., ‘dental’ was by far the least well identified by both groups. Overall, adding pitch information seemed to hamper classification performance.
  • Scharenborg, O. (2008). Modelling fine-phonetic detail in a computational model of word recognition. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1473-1476). ISCA Archive.

    Abstract

    There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal helps listeners to segment a speech signal into syllables and words. In this paper, we compare two computational models of word recognition on their ability to capture and use this finephonetic detail during speech recognition. One model, SpeM, is phoneme-based, whereas the other, newly developed Fine- Tracker, is based on articulatory features. Simulations dealt with modelling the ability of listeners to distinguish short words (e.g., ‘ham’) from the longer words in which they are embedded (e.g., ‘hamster’). The simulations with Fine- Tracker showed that it was, like human listeners, able to distinguish between short words from the longer words in which they are embedded. This suggests that it is possible to extract this fine-phonetic detail from the speech signal and use it during word recognition.
  • Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., & Sloetjes, H. (2008). An exchange format for multimodal annotations. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation of multimodality. We propose a multimodal annotation exchange format, based on the annotation graph formalism, which is supported by import and export routines in the respective tools
  • Schuller, B., Steidl, S., Batliner, A., Bergelson, E., Krajewski, J., Janott, C., Amatuni, A., Casillas, M., Seidl, A., Soderstrom, M., Warlaumont, A. S., Hidalgo, G., Schnieder, S., Heiser, C., Hohenhorst, W., Herzog, M., Schmitt, M., Qian, K., Zhang, Y., Trigeorgis, G. and 2 moreSchuller, B., Steidl, S., Batliner, A., Bergelson, E., Krajewski, J., Janott, C., Amatuni, A., Casillas, M., Seidl, A., Soderstrom, M., Warlaumont, A. S., Hidalgo, G., Schnieder, S., Heiser, C., Hohenhorst, W., Herzog, M., Schmitt, M., Qian, K., Zhang, Y., Trigeorgis, G., Tzirakis, P., & Zafeiriou, S. (2017). The INTERSPEECH 2017 computational paralinguistics challenge: Addressee, cold & snoring. In Proceedings of Interspeech 2017 (pp. 3442-3446). doi:10.21437/Interspeech.2017-43.

    Abstract

    The INTERSPEECH 2017 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: In the Addressee sub-challenge, it has to be determined whether speech produced by an adult is directed towards another adult or towards a child; in the Cold sub-challenge, speech under cold has to be told apart from ‘healthy’ speech; and in the Snoring subchallenge, four different types of snoring have to be classified. In this paper, we describe these sub-challenges, their conditions, and the baseline feature extraction and classifiers, which include data-learnt feature representations by end-to-end learning with convolutional and recurrent neural networks, and bag-of-audiowords for the first time in the challenge series
  • Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.

    Abstract

    This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.
  • Scott, D. R., & Cutler, A. (1982). Segmental cues to syntactic structure. In Proceedings of the Institute of Acoustics 'Spectral Analysis and its Use in Underwater Acoustics' (pp. E3.1-E3.4). London: Institute of Acoustics.
  • Sekine, K. (2017). Gestural hesitation reveals children’s competence on multimodal communication: Emergence of disguised adaptor. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3113-3118). Austin, TX: Cognitive Science Society.

    Abstract

    Speakers sometimes modify their gestures during the process of production into adaptors such as hair touching or eye scratching. Such disguised adaptors are evidence that the speaker can monitor their gestures. In this study, we investigated when and how disguised adaptors are first produced by children. Sixty elementary school children participated in this study (ten children in each age group; from 7 to 12 years old). They were instructed to watch a cartoon and retell it to their parents. The results showed that children did not produce disguised adaptors until the age of 8. The disguised adaptors accompany fluent speech until the children are 10 years old and accompany dysfluent speech until they reach 11 or 12 years of age. These results suggest that children start to monitor their gestures when they are 9 or 10 years old. Cognitive changes were considered as factors to influence emergence of disguised adaptors
  • Senft, G. (1991). Bakavilisi Biga - we can 'turn' the language - or: What happens to English words in Kilivila language? In W. Bahner, J. Schildt, & D. Viehwegger (Eds.), Proceedings of the XIVth International Congress of Linguists (pp. 1743-1746). Berlin: Akademie Verlag.
  • Seuren, P. A. M. (1975). Autonomous syntax and prelexical rules. In S. De Vriendt, J. Dierickx, & M. Wilmet (Eds.), Grammaire générative et psychomécanique du langage: actes du colloque organisé par le Centre d'études linguistiques et littéraires de la Vrije Universiteit Brussel, Bruxelles, 29-31 mai 1974 (pp. 89-98). Paris: Didier.
  • Seuren, P. A. M. (1975). Logic and language. In S. De Vriendt, J. Dierickx, & M. Wilmet (Eds.), Grammaire générative et psychomécanique du langage: actes du colloque organisé par le Centre d'études linguistiques et littéraires de la Vrije Universiteit Brussel, Bruxelles, 29-31 mai 1974 (pp. 84-87). Paris: Didier.
  • Seuren, P. A. M. (1991). Notes on noun phrases and quantification. In Proceedings of the International Conference on Current Issues in Computational Linguistics (pp. 19-44). Penang, Malaysia: Universiti Sains Malaysia.
  • Seuren, P. A. M. (1982). Riorientamenti metodologici nello studio della variabilità linguistica. In D. Gambarara, & A. D'Atri (Eds.), Ideologia, filosofia e linguistica: Atti del Convegno Internazionale di Studi, Rende (CS) 15-17 Settembre 1978 ( (pp. 499-515). Roma: Bulzoni.
  • Seuren, P. A. M. (1991). What makes a text untranslatable? In H. M. N. Noor Ein, & H. S. Atiah (Eds.), Pragmatik Penterjemahan: Prinsip, Amalan dan Penilaian Menuju ke Abad 21 ("The Pragmatics of Translation: Principles, Practice and Evaluation Moving towards the 21st Century") (pp. 19-27). Kuala Lumpur: Dewan Bahasa dan Pustaka.
  • Li, Y., Wu, S., Shi, S., Tong, S., Zhang, Y., & Guo, X. (2021). Enhanced inter-brain connectivity between children and adults during cooperation: a dual EEG study. In 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC) (pp. 6289-6292). doi:10.1109/EMBC46164.2021.9630330.

    Abstract

    Previous fNIRS studies have suggested that adult-child cooperation is accompanied by increased inter-brain synchrony. However, its reflection in the electrophysiological synchrony remains unclear. In this study, we designed a naturalistic and well-controlled adult-child interaction paradigm using a tangram solving video game, and recorded dual-EEG from child and adult dyads during cooperative and individual conditions. By calculating the directed inter-brain connectivity in the theta and alpha bands, we found that the inter-brain frontal network was more densely connected and stronger in strength during the cooperative than the individual condition when the adult was watching the child playing. Moreover, the inter-brain network across different dyads shared more common information flows from the player to the observer during cooperation, but was more individually different in solo play. The results suggest an enhancement in inter-brain EEG interactions during adult-child cooperation. However, the enhancement was evident in all cooperative cases but partly depended on the role of participants.
  • Sloetjes, H., & Wittenburg, P. (2008). Annotation by category - ELAN and ISO DCR. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    The Data Category Registry is one of the ISO initiatives towards the establishment of standards for Language Resource management, creation and coding. Successful application of the DCR depends on the availability of tools that can interact with it. This paper describes the first steps that have been taken to provide users of the multimedia annotation tool ELAN, with the means to create references from tiers and annotations to data categories defined in the ISO Data Category Registry. It first gives a brief description of the capabilities of ELAN and the structure of the documents it creates. After a concise overview of the goals and current state of the ISO DCR infrastructure, a description is given of how the preliminary connectivity with the DCR is implemented in ELAN
  • Slonimska, A., & Roberts, S. G. (2017). A case for systematic sound symbolism in pragmatics:The role of the first phoneme in question prediction in context. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 1090-1095). Austin, TX: Cognitive Science Society.

    Abstract

    Turn-taking in conversation is a cognitively demanding process that proceeds rapidly due to interlocutors utilizing a range of cues
    to aid prediction. In the present study we set out to test recent claims that content question words (also called wh-words) sound similar within languages as an adaptation to help listeners predict
    that a question is about to be asked. We test whether upcoming questions can be predicted based on the first phoneme of a turn and the prior context. We analyze the Switchboard corpus of English
    by means of a decision tree to test whether /w/ and /h/ are good statistical cues of upcoming questions in conversation. Based on the results, we perform a controlled experiment to test whether
    people really use these cues to recognize questions. In both studies
    we show that both the initial phoneme and the sequential context help predict questions. This contributes converging evidence that elements of languages adapt to pragmatic pressures applied during
    conversation.
  • De Sousa, H. (2008). The development of echo-subject markers in Southern Vanuatu. In T. J. Curnow (Ed.), Selected papers from the 2007 Conference of the Australian Linguistic Society. Australian Linguistic Society.

    Abstract

    One of the defining features of the Southern Vanuatu language family is the echo-subject (ES) marker (Lynch 2001: 177-178). Canonically, an ES marker indicates that the subject of the clause is coreferential with the subject of the preceding clause. This paper begins with a survey of the various ES systems found in Southern Vanuatu. Two prominent differences amongst the ES systems are: a) the level of obligatoriness of the ES marker; and b) the level of grammatical integration between an ES clauses and the preceding clause. The variation found amongst the ES systems reveals a clear path of grammaticalisation from the VP coordinator *ma in Proto–Southern Vanuatu to the various types of ES marker in contemporary Southern Vanuatu languages
  • Stanojevic, M., & Alhama, R. G. (2017). Neural discontinuous constituency parsing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 1666-1676). Association for Computational Linguistics.

    Abstract

    One of the most pressing issues in dis-
    continuous constituency transition-based
    parsing is that the relevant information for
    parsing decisions could be located in any
    part of the stack or the buffer. In this pa-
    per, we propose a solution to this prob-
    lem by replacing the structured percep-
    tron model with a recursive neural model
    that computes a global representation of
    the configuration, therefore allowing even
    the most remote parts of the configura-
    tion to influence the parsing decisions. We
    also provide a detailed analysis of how
    this representation should be built out of
    sub-representations of its core elements
    (words, trees and stack). Additionally, we
    investigate how different types of swap or-
    acles influence the results. Our model is
    the first neural discontinuous constituency
    parser, and it outperforms all the previ-
    ously published models on three out of
    four datasets while on the fourth it obtains
    second place by a tiny difference.

    Additional information

    http://aclweb.org/anthology/D17-1174
  • Stehouwer, H., & Van den Bosch, A. (2008). Putting the t where it belongs: Solving a confusion problem in Dutch. In S. Verberne, H. Van Halteren, & P.-A. Coppen (Eds.), Computational Linguistics in the Netherlands 2007: Selected Papers from the 18th CLIN Meeting (pp. 21-36). Utrecht: LOT.

    Abstract

    A common Dutch writing error is to confuse a word ending in -d with a neighbor word ending in -dt. In this paper we describe the development of a machine-learning-based disambiguator that can determine which word ending is appropriate, on the basis of its local context. We develop alternative disambiguators, varying between a single monolithic classifier and having multiple confusable experts disambiguate between confusable pairs. Disambiguation accuracy of the best developed disambiguators exceeds 99%; when we apply these disambiguators to an external test set of collected errors, our detection strategy correctly identifies up to 79% of the errors.
  • Sumer, B., Grabitz, C., & Küntay, A. (2017). Early produced signs are iconic: Evidence from Turkish Sign Language. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (CogSci 2017) (pp. 3273-3278). Austin, TX: Cognitive Science Society.

    Abstract

    Motivated form-meaning mappings are pervasive in sign languages, and iconicity has recently been shown to facilitate sign learning from early on. This study investigated the role of iconicity for language acquisition in Turkish Sign Language (TID). Participants were 43 signing children (aged 10 to 45 months) of deaf parents. Sign production ability was recorded using the adapted version of MacArthur Bates Communicative Developmental Inventory (CDI) consisting of 500 items for TID. Iconicity and familiarity ratings for a subset of 104 signs were available. Our results revealed that the iconicity of a sign was positively correlated with the percentage of children producing a sign and that iconicity significantly predicted the percentage of children producing a sign, independent of familiarity or phonological complexity. Our results are consistent with previous findings on sign language acquisition and provide further support for the facilitating effect of iconic form-meaning mappings in sign learning.
  • Ten Bosch, L., Boves, L., & Ernestus, M. (2017). The recognition of compounds: A computational account. In Proceedings of Interspeech 2017 (pp. 1158-1162). doi:10.21437/Interspeech.2017-1048.

    Abstract

    This paper investigates the processes in comprehending spoken noun-noun compounds, using data from the BALDEY database. BALDEY contains lexicality judgments and reaction times (RTs) for Dutch stimuli for which also linguistic information is included. Two different approaches are combined. The first is based on regression by Dynamic Survival Analysis, which models decisions and RTs as a consequence of the fact that a cumulative density function exceeds some threshold. The parameters of that function are estimated from the observed RT data. The second approach is based on DIANA, a process-oriented computational model of human word comprehension, which simulates the comprehension process with the acoustic stimulus as input. DIANA gives the identity and the number of the word candidates that are activated at each 10 ms time step.

    Both approaches show how the processes involved in comprehending compounds change during a stimulus. Survival Analysis shows that the impact of word duration varies during the course of a stimulus. The density of word and non-word hypotheses in DIANA shows a corresponding pattern with different regimes. We show how the approaches complement each other, and discuss additional ways in which data and process models can be combined.
  • Trilsbeek, P., Broeder, D., Van Valkenhoef, T., & Wittenburg, P. (2008). A grid of regional language archives. In C. Calzolari (Ed.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008) (pp. 1474-1477). European Language Resources Association (ELRA).

    Abstract

    About two years ago, the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, started an initiative to install regional language archives in various places around the world, particularly in places where a large number of endangered languages exist and are being documented. These digital archives make use of the LAT archiving framework [1] that the MPI has developed
    over the past nine years. This framework consists of a number of web-based tools for depositing, organizing and utilizing linguistic resources in a digital archive. The regional archives are in principle autonomous archives, but they can decide to share metadata descriptions and language resources with the MPI archive in Nijmegen and become part of a grid of linked LAT archives. By doing so, they will also take advantage of the long-term preservation strategy of the MPI archive. This paper describes the reasoning
    behind this initiative and how in practice such an archive is set up.
  • Tsoukala, C., Frank, S. L., & Broersma, M. (2017). “He's pregnant": Simulating the confusing case of gender pronoun errors in L2 English. In Proceedings of the 39th Annual Meeting of the Cognitive Science Society (CogSci 2017) (pp. 3392-3397). Austin, TX, USA: Cognitive Science Society.

    Abstract

    Even advanced Spanish speakers of second language English tend to confuse the pronouns ‘he’ and ‘she’, often without even noticing their mistake (Lahoz, 1991). A study by AntónMéndez (2010) has indicated that a possible reason for this error is the fact that Spanish is a pro-drop language. In order to test this hypothesis, we used an extension of Dual-path (Chang, 2002), a computational cognitive model of sentence production, to simulate two models of bilingual speech production of second language English. One model had Spanish (ES) as a native language, whereas the other learned a Spanish-like language that used the pronoun at all times (non-pro-drop Spanish, NPD_ES). When tested on L2 English sentences, the bilingual pro-drop Spanish model produced significantly more gender pronoun errors, confirming that pronoun dropping could indeed be responsible for the gender confusion in natural language use as well.
  • Van Dooren, A., Dieuleveut, A., Cournane, A., & Hacquard, V. (2017). Learning what must and can must and can mean. In A. Cremers, T. Van Gessel, & F. Roelofsen (Eds.), Proceedings of the 21st Amsterdam Colloquium (pp. 225-234). Amsterdam: ILLC.

    Abstract

    This corpus study investigates how children figure out that functional modals
    like must can express various flavors of modality. We examine how modality is
    expressed in speech to and by children, and find that the way speakers use
    modals may obscure their polysemy. Yet, children eventually figure it out. Our
    results suggest that some do before age 3. We show that while root and
    epistemic flavors are not equally well-represented in the input, there are robust
    correlations between flavor and aspect, which learners could exploit to discover
    modal polysemy.
  • Van Dooren, A. (2017). Dutch must more structure. In A. Lamont, & K. Tetzloff (Eds.), NELS 47: Proceedings of the Forty-Seventh Annual Meeting of the North East Linguistic Society (pp. 165-175). Amherst: GLSA.
  • Van Ooijen, B., Cutler, A., & Norris, D. (1991). Detection times for vowels versus consonants. In Eurospeech 91: Vol. 3 (pp. 1451-1454). Genova: Istituto Internazionale delle Comunicazioni.

    Abstract

    This paper reports two experiments with vowels and consonants as phoneme detection targets in real words. In the first experiment, two relatively distinct vowels were compared with two confusible stop consonants. Response times to the vowels were longer than to the consonants. Response times correlated negatively with target phoneme length. In the second, two relatively distinct vowels were compared with their corresponding semivowels. This time, the vowels were detected faster than the semivowels. We conclude that response time differences between vowels and stop consonants in this task may reflect differences between phoneme categories in the variability of tokens, both in the acoustic realisation of targets and in the' representation of targets by subjects.
  • Van Uytvanck, D., Dukers, A., Ringersma, J., & Trilsbeek, P. (2008). Language-sites: Accessing and presenting language resources via geographic information systems. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). Paris: European Language Resources Association (ELRA).

    Abstract

    The emerging area of Geographic Information Systems (GIS) has proven to add an interesting dimension to many research projects. Within the language-sites initiative we have brought together a broad range of links to digital language corpora and resources. Via Google Earth's visually appealing 3D-interface users can spin the globe, zoom into an area they are interested in and access directly the relevant language resources. This paper focuses on several ways of relating the map and the online data (lexica, annotations, multimedia recordings, etc.). Furthermore, we discuss some of the implementation choices that have been made, including future challenges. In addition, we show how scholars (both linguists and anthropologists) are using GIS tools to fulfill their specific research needs by making use of practical examples. This illustrates how both scientists and the general public can benefit from geography-based access to digital language data
  • Váradi, T., Wittenburg, P., Krauwer, S., Wynne, M., & Koskenniemi, K. (2008). CLARIN: Common language resources and technology infrastructure. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper gives an overview of the CLARIN project [1], which aims to create a research infrastructure that makes language resources and technology (LRT) available and readily usable to scholars of all disciplines, in particular the humanities and social sciences (HSS).
  • Vernes, S. C., Janik, V. M., Fitch, W. T., & Slater, P. J. B. (Eds.). (2021). Vocal learning in animals and humans [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 376.
  • Vosse, T. G., & Kempen, G. (2008). Parsing verb-final clauses in German: Garden-path and ERP effects modeled by a parallel dynamic parser. In B. Love, K. McRae, & V. Sloutsky (Eds.), Proceedings of the 30th Annual Conference on the Cognitive Science Society (pp. 261-266). Washington: Cognitive Science Society.

    Abstract

    Experimental sentence comprehension studies have shown that superficially similar German clauses with verb-final word order elicit very different garden-path and ERP effects. We show that a computer implementation of the Unification Space parser (Vosse & Kempen, 2000) in the form of a localist-connectionist network can model the observed differences, at least qualitatively. The model embodies a parallel dynamic parser that, in contrast with existing models, does not distinguish between consecutive first-pass and reanalysis stages, and does not use semantic or thematic roles. It does use structural frequency data and animacy information.
  • Vosse, T., & Kempen, G. (1991). A hybrid model of human sentence processing: Parsing right-branching, center-embedded and cross-serial dependencies. In M. Tomita (Ed.), Proceedings of the Second International Workshop on Parsing Technologies.
  • Weber, A., & Melinger, A. (2008). Name dominance in spoken word recognition is (not) modulated by expectations: Evidence from synonyms. In A. Botinis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (ExLing 2008) (pp. 225-228). Athens: University of Athens.

    Abstract

    Two German eye-tracking experiments tested whether top-down expectations interact with acoustically-driven word-recognition processes. Competitor objects with two synonymous names were paired with target objects whose names shared word onsets with either the dominant or the non-dominant name of the competitor. Non-dominant names of competitor objects were either introduced before the test session or not. Eye-movements were monitored while participants heard instructions to click on target objects. Results demonstrate dominant and non-dominant competitor names were considered for recognition, regardless of top-down expectations, though dominant names were always activated more strongly.
  • Weber, A. (1998). Listening to nonnative language which violates native assimilation rules. In D. Duez (Ed.), Proceedings of the European Scientific Communication Association workshop: Sound patterns of Spontaneous Speech (pp. 101-104).

    Abstract

    Recent studies using phoneme detection tasks have shown that spoken-language processing is neither facilitated nor interfered with by optional assimilation, but is inhibited by violation of obligatory assimilation. Interpretation of these results depends on an assessment of their generality, specifically, whether they also obtain when listeners are processing nonnative language. Two separate experiments are presented in which native listeners of German and native listeners of Dutch had to detect a target fricative in legal monosyllabic Dutch nonwords. All of the nonwords were correct realisations in standard Dutch. For German listeners, however, half of the nonwords contained phoneme strings which violate the German fricative assimilation rule. Whereas the Dutch listeners showed no significant effects, German listeners detected the target fricative faster when the German fricative assimilation was violated than when no violation occurred. The results might suggest that violation of assimilation rules does not have to make processing more difficult per se.
  • Weber, A. (2008). What the eyes can tell us about spoken-language comprehension [Abstract]. Journal of the Acoustical Society of America, 124, 2474-2474.

    Abstract

    Lexical recognition is typically slower in L2 than in L1. Part of the difficulty comes from a not precise enough processing of L2 phonemes. Consequently, L2 listeners fail to eliminate candidate words that L1 listeners can exclude from competing for recognition. For instance, the inability to distinguish /r/ from /l/ in rocket and locker makes for Japanese listeners both words possible candidates when hearing their onset (e.g., Cutler, Weber, and Otake, 2006). The L2 disadvantage can, however, be dispelled: For L2 listeners, but not L1 listeners, L2 speech from a non-native talker with the same language background is known to be as intelligible as L2 speech from a native talker (e.g., Bent and Bradlow, 2003). A reason for this may be that L2 listeners have ample experience with segmental deviations that are characteristic for their own accent. On this account, only phonemic deviations that are typical for the listeners’ own accent will cause spurious lexical activation in L2 listening (e.g., English magic pronounced as megic for Dutch listeners). In this talk, I will present evidence from cross-modal priming studies with a variety of L2 listener groups, showing how the processing of phonemic deviations is accent-specific but withstands fine phonetic differences.
  • Wittek, A. (1998). Learning verb meaning via adverbial modification: Change-of-state verbs in German and the adverb "wieder" again. In A. Greenhill, M. Hughes, H. Littlefield, & H. Walsh (Eds.), Proceedings of the 22nd Annual Boston University Conference on Language Development (pp. 779-790). Somerville, MA: Cascadilla Press.
  • Zhang, Y., Ding, R., Frassinelli, D., Tuomainen, J., Klavinskis-Whiting, S., & Vigliocco, G. (2021). Electrophysiological signatures of second language multimodal comprehension. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 2971-2977). Vienna: Cognitive Science Society.

    Abstract

    Language is multimodal: non-linguistic cues, such as prosody,
    gestures and mouth movements, are always present in face-to-
    face communication and interact to support processing. In this
    paper, we ask whether and how multimodal cues affect L2
    processing by recording EEG for highly proficient bilinguals
    when watching naturalistic materials. For each word, we
    quantified surprisal and the informativeness of prosody,
    gestures, and mouth movements. We found that each cue
    modulates the N400: prosodic accentuation, meaningful
    gestures, and informative mouth movements all reduce N400.
    Further, effects of meaningful gestures but not mouth
    informativeness are enhanced by prosodic accentuation,
    whereas effects of mouth are enhanced by meaningful gestures
    but reduced by beat gestures. Compared with L1, L2
    participants benefit less from cues and their interactions, except
    for meaningful gestures and mouth movements. Thus, in real-
    world language comprehension, L2 comprehenders use
    multimodal cues just as L1 speakers albeit to a lesser extent.
  • Zhang, Y., & Yu, C. (2017). How misleading cues influence referential uncertainty in statistical cross-situational learning. In M. LaMendola, & J. Scott (Eds.), Proceedings of the 41st Annual Boston University Conference on Language Development (BUCLD 41) (pp. 820-833). Boston, MA: Cascadilla Press.
  • Zhang, Y., Amatuni, A., Cain, E., Wang, X., Crandall, D., & Yu, C. (2021). Human learners integrate visual and linguistic information cross-situational verb learning. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 2267-2273). Vienna: Cognitive Science Society.

    Abstract

    Learning verbs is challenging because it is difficult to infer the precise meaning of a verb when there are a multitude of relations that one can derive from a single event. To study this verb learning challenge, we used children's egocentric view collected from naturalistic toy-play interaction as learning materials and investigated how visual and linguistic information provided in individual naming moments as well as cross-situational information provided from multiple learning moments can help learners resolve this mapping problem using the Human Simulation Paradigm. Our results show that learners benefit from seeing children's egocentric views compared to third-person observations. In addition, linguistic information can help learners identify the correct verb meaning by eliminating possible meanings that do not belong to the linguistic category. Learners are also able to integrate visual and linguistic information both within and across learning situations to reduce the ambiguity in the space of possible verb meanings.
  • Zimianiti, E., Dimitrakopoulou, M., & Tsangalidis, A. (2021). Τhematic roles in dementia: The case of psychological verbs. In A. Botinis (Ed.), ExLing 2021: Proceedings of the 12th International Conference of Experimental Linguistics (pp. 269-272). Athens, Greece: ExLing Society.

    Abstract

    This study investigates the difficulty of people with Mild Cognitive Impairment (MCI), mild and moderate Alzheimer’s disease (AD) in the production and comprehension of psychological verbs, as thematic realization may involve both the canonical and non-canonical realization of arguments. More specifically, we aim to examine whether there is a deficit in the mapping of syntactic and semantic representations in psych-predicates regarding Greek-speaking individuals with MCI and AD, and whether the linguistic abilities associated with θ-role assignment decrease as the disease progresses. Moreover, given the decline of cognitive abilities in people with MCI and AD, we explore the effects of components of memory (Semantic, Episodic, and Working Memory) on the assignment of thematic roles in constructions with psychological verbs.
  • Zinn, C., Cablitz, G., Ringersma, J., Kemps-Snijders, M., & Wittenburg, P. (2008). Constructing knowledge spaces from linguistic resources. In Proceedings of the CIL 18 Workshop on Linguistic Studies of Ontology: From lexical semantics to formal ontologies and back.
  • Zinn, C. (2008). Conceptual spaces in ViCoS. In S. Bechhofer, M. Hauswirth, J. Hoffmann, & M. Koubarakis (Eds.), The semantic web: Research and applications (pp. 890-894). Berlin: Springer.

    Abstract

    We describe ViCoS, a tool for constructing and visualising conceptual spaces in the area of language documentation. ViCoS allows users to enrich existing lexical information about the words of a language with conceptual knowledge. Their work towards language-based, informal ontology building must be supported by easy-to-use workflows and supporting software, which we will demonstrate.
  • De Zubicaray, G., & Fisher, S. E. (Eds.). (2017). Genes, brain and language [Special Issue]. Brain and Language, 172.
  • Zwitserlood, I., Ozyurek, A., & Perniss, P. M. (2008). Annotation of sign and gesture cross-linguistically. In O. Crasborn, E. Efthimiou, T. Hanke, E. D. Thoutenhoofd, & I. Zwitserlood (Eds.), Construction and Exploitation of Sign Language Corpora. 3rd Workshop on the Representation and Processing of Sign Languages (pp. 185-190). Paris: ELDA.

    Abstract

    This paper discusses the construction of a cross-linguistic, bimodal corpus containing three modes of expression: expressions from two sign languages, speech and gestural expressions in two spoken languages and pantomimic expressions by users of two spoken languages who are requested to convey information without speaking. We discuss some problems and tentative solutions for the annotation of utterances expressing spatial information about referents in these three modes, suggesting a set of comparable codes for the description of both sign and gesture. Furthermore, we discuss the processing of entered annotations in ELAN, e.g. relating descriptive annotations to analytic annotations in all three modes and performing relational searches across annotations on different tiers.

Share this page