Publications

Displaying 1 - 100 of 168
  • Adank, P., Smits, R., & Van Hout, R. (2003). Modeling perceived vowel height, advancement, and rounding. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 647-650). Adelaide: Causal Productions.
  • Akamine, S., Kohatsu, T., Niikuni, K., Schafer, A. J., & Sato, M. (2022). Emotions in language processing: Affective priming in embodied cognition. In Proceedings of the 39th Annual Meeting of Japanese Cognitive Science Society (pp. 326-332). Tokyo: Japanese Cognitive Science Society.
  • Akamine, S., Ghaleb, E., Rasenberg, M., Fernandez, R., Meyer, A. S., & Özyürek, A. (2024). Speakers align both their gestures and words not only to establish but also to maintain reference to create shared labels for novel objects in interaction. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 2435-2442).

    Abstract

    When we communicate with others, we often repeat aspects of each other's communicative behavior such as sentence structures and words. Such behavioral alignment has been mostly studied for speech or text. Yet, language use is mostly multimodal, flexibly using speech and gestures to convey messages. Here, we explore the use of alignment in speech (words) and co-speech gestures (iconic gestures) in a referential communication task aimed at finding labels for novel objects in interaction. In particular, we investigate how people flexibly use lexical and gestural alignment to create shared labels for novel objects and whether alignment in speech and gesture are related over time. The present study shows that interlocutors establish shared labels multimodally, and alignment in words and iconic gestures are used throughout the interaction. We also show that the amount of lexical alignment positively associates with the amount of gestural alignment over time, suggesting a close relationship between alignment in the vocal and manual modalities.

    Additional information

    link to eScholarship
  • Bauer, B. L. M. (2022). Finite verb + infinite + object in later Latin: Early brace constructions? In G. V. M. Haverling (Ed.), Studies on Late and Vulgar Latin in the Early 21st Century: Acts of the 12th International Colloquium "Latin vulgaire – Latin tardif (pp. 166-181). Uppsala: Acta Universitatis Upsaliensis.
  • Bauer, B. L. M. (2003). The adverbial formation in mente in Vulgar and Late Latin: A problem in grammaticalization. In H. Solin, M. Leiwo, & H. Hallo-aho (Eds.), Latin vulgaire, latin tardif VI (pp. 439-457). Hildesheim: Olms.
  • Ben-Ami, S., Shukla, Vishakha, V., Gupta, P., Shah, P., Ralekar, C., Ganesh, S., Gilad-Gutnick, S., Rubio-Fernández, P., & Sinha, P. (2024). Form perception as a bridge to real-world functional proficiency. In L. K. Samuelson, S. L. Frank, M. Toneva, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 6094-6102).

    Abstract

    Recognizing the limitations of standard vision assessments in capturing the real-world capabilities of individuals with low vision, we investigated the potential of the Seguin Form Board Test (SFBT), a widely-used intelligence assessment employing a visuo-haptic shape-fitting task, as an estimator of vision's practical utility. We present findings from 23 children from India, who underwent treatment for congenital bilateral dense cataracts, and 21 control participants. To assess the development of functional visual ability, we conducted the SFBT and the standard measure of visual acuity, before and longitudinally after treatment. We observed a dissociation in the development of shape-fitting and visual acuity. Improvements of patients' shape-fitting preceded enhancements in their visual acuity after surgery and emerged even with acuity worse than that of control participants. Our findings highlight the importance of incorporating multi-modal and cognitive aspects into evaluations of visual proficiency in low-vision conditions, to better reflect vision's impact on daily activities.

    Additional information

    link to eScholarship
  • Berck, P., Bibiko, H.-J., Kemps-Snijders, M., Russel, A., & Wittenburg, P. (2006). Ontology-based language archive utilization. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 2295-2298).
  • Bowerman, M., de León, L., & Choi, S. (1995). Verbs, particles, and spatial semantics: Learning to talk about spatial actions in typologically different languages. In E. V. Clark (Ed.), Proceedings of the Twenty-seventh Annual Child Language Research Forum (pp. 101-110). Stanford, CA: Center for the Study of Language and Information.
  • Broeder, D., Offenga, F., Wittenburg, P., Van de Kamp, P., Nathan, D., & Strömqvist, S. (2006). Technologies for a federation of language resource archive. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 2291-2294).
  • Broeder, D., Van Veenendaal, R., Nathan, D., & Strömqvist, S. (2006). A grid of language resource repositories. In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing.
  • Broeder, D., Claus, A., Offenga, F., Skiba, R., Trilsbeek, P., & Wittenburg, P. (2006). LAMUS: The Language Archive Management and Upload System. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 2291-2294).
  • Broersma, M. (2006). Nonnative listeners rely less on phonetic information for phonetic categorization than native listeners. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 109-110).
  • Broersma, M. (2006). Accident - execute: Increased activation in nonnative listening. In Proceedings of Interspeech 2006 (pp. 1519-1522).

    Abstract

    Dutch and English listeners’ perception of English words with partially overlapping onsets (e.g., accident- execute) was investigated. Partially overlapping words remained active longer for nonnative listeners, causing an increase of lexical competition in nonnative compared with native listening.
  • Bruggeman, L., Yu, J., & Cutler, A. (2022). Listener adjustment of stress cue use to fit language vocabulary structure. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 264-267). doi:10.21437/SpeechProsody.2022-54.

    Abstract

    In lexical stress languages, phonemically identical syllables can differ suprasegmentally (in duration, amplitude, F0). Such stress
    cues allow listeners to speed spoken-word recognition by rejecting mismatching competitors (e.g., unstressed set- in settee
    rules out stressed set- in setting, setter, settle). Such processing effects have indeed been observed in Spanish, Dutch and German, but English listeners are known to largely ignore stress cues. Dutch and German listeners even outdo English listeners in distinguishing stressed versus unstressed English syllables. This has been attributed to the relative frequency across the stress languages of unstressed syllables with full vowels; in English most unstressed syllables contain schwa, instead, and stress cues on full vowels are thus least often informative in this language. If only informativeness matters, would English listeners who encounter situations where such cues would pay off for them (e.g., learning one of those other stress languages) then shift to using stress cues? Likewise, would stress cue users with English as L2, if mainly using English, shift away from
    using the cues in English? Here we report tests of these two questions, with each receiving a yes answer. We propose that
    English listeners’ disregard of stress cues is purely pragmatic.
  • Brugman, H., Malaisé, V., & Gazendam, L. (2006). A web based general thesaurus browser to support indexing of television and radio programs. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1488-1491).
  • Bujok, R., Meyer, A. S., & Bosker, H. R. (2022). Visible lexical stress cues on the face do not influence audiovisual speech perception. In S. Frota, M. Cruz, & M. Vigário (Eds.), Proceedings of Speech Prosody 2022 (pp. 259-263). doi:10.21437/SpeechProsody.2022-53.

    Abstract

    Producing lexical stress leads to visible changes on the face, such as longer duration and greater size of the opening of the mouth. Research suggests that these visual cues alone can inform participants about which syllable carries stress (i.e., lip-reading silent videos). This study aims to determine the influence of visual articulatory cues on lexical stress perception in more naturalistic audiovisual settings. Participants were presented with seven disyllabic, Dutch minimal stress pairs (e.g., VOORnaam [first name] & voorNAAM [respectable]) in audio-only (phonetic lexical stress continua without video), video-only (lip-reading silent videos), and audiovisual trials (e.g., phonetic lexical stress continua with video of talker saying VOORnaam or voorNAAM). Categorization data from video-only trials revealed that participants could distinguish the minimal pairs above chance from seeing the silent videos alone. However, responses in the audiovisual condition did not differ from the audio-only condition. We thus conclude that visual lexical stress information on the face, while clearly perceivable, does not play a major role in audiovisual speech perception. This study demonstrates that clear unimodal effects do not always generalize to more naturalistic multimodal communication, advocating that speech prosody is best considered in multimodal settings.
  • Butterfield, S., & Cutler, A. (1988). Segmentation errors by human listeners: Evidence for a prosodic segmentation strategy. In W. Ainsworth, & J. Holmes (Eds.), Proceedings of SPEECH ’88: Seventh Symposium of the Federation of Acoustic Societies of Europe: Vol. 3 (pp. 827-833). Edinburgh: Institute of Acoustics.
  • Cambier, N., Miletitch, R., Burraco, A. B., & Raviv, L. (2022). Prosociality in swarm robotics: A model to study self-domestication and language evolution. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 98-100). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Chen, Y., & Braun, B. (2006). Prosodic realization in information structure categories in standard Chinese. In R. Hoffmann, & H. Mixdorff (Eds.), Speech Prosody 2006. Dresden: TUD Press.

    Abstract

    This paper investigates the prosodic realization of information
    structure categories in Standard Chinese. A number of proper
    names with different tonal combinations were elicited as a
    grammatical subject in five pragmatic contexts. Results show
    that both duration and F0 range of the tonal realizations were
    adjusted to signal the information structure categories (i.e.
    theme vs. rheme and background vs. focus). Rhemes
    consistently induced a longer duration and a more expanded F0
    range than themes. Focus, compared to background, generally
    induced lengthening and F0 range expansion (the presence and
    magnitude of which, however, are dependent on the tonal
    structure of the proper names). Within the rheme focus
    condition, corrective rheme focus induced more expanded F0
    range than normal rheme focus.
  • Chen, A. (2006). Variations in the marking of focus in child language. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 113-114).
  • Chen, A. (2006). Interface between information structure and intonation in Dutch wh-questions. In R. Hoffmann, & H. Mixdorff (Eds.), Speech Prosody 2006. Dresden: TUD Press.

    Abstract

    This study set out to investigate how accent placement is pragmatically governed in WH-questions. Central to this issue are questions such as whether the intonation of the WH-word depends on the information structure of the non-WH word part, whether topical constituents can be accented, and whether constituents in the non-WH word part can be non-topical and accented. Previous approaches, based either on carefully composed examples or on read speech, differ in their treatments of these questions and consequently make opposing claims on the intonation of WH-questions. We addressed these questions by examining a corpus of 90 naturally occurring WH-questions, selected from the Spoken Dutch Corpus. Results show that the intonation of the WH-word is related to the information structure of the non-WH word part. Further, topical constituents can get accented and the accents are not necessarily phonetically reduced. Additionally, certain adverbs, which have no topical relation to the presupposition of the WH-questions, also get accented. They appear to function as a device for enhancing speaker engagement.
  • Chen, A. (2003). Language dependence in continuation intonation. In M. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS.) (pp. 1069-1072). Rundle Mall, SA, Austr.: Causal Productions Pty.
  • Chen, A. (2003). Reaction time as an indicator to discrete intonational contrasts in English. In Proceedings of Eurospeech 2003 (pp. 97-100).

    Abstract

    This paper reports a perceptual study using a semantically motivated identification task in which we investigated the nature of two pairs of intonational contrasts in English: (1) normal High accent vs. emphatic High accent; (2) early peak alignment vs. late peak alignment. Unlike previous inquiries, the present study employs an on-line method using the Reaction Time measurement, in addition to the measurement of response frequencies. Regarding the peak height continuum, the mean RTs are shortest for within-category identification but longest for across-category identification. As for the peak alignment contrast, no identification boundary emerges and the mean RTs only reflect a difference between peaks aligned with the vowel onset and peaks aligned elsewhere. We conclude that the peak height contrast is discrete but the previously claimed discreteness of the peak alignment contrast is not borne out.
  • Cheung, C.-Y., Yakpo, K., & Coupé, C. (2022). A computational simulation of the genesis and spread of lexical items in situations of abrupt language contact. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 115-122). Nijmegen: Joint Conference on Language Evolution (JCoLE).

    Abstract

    The current study presents an agent-based model which simulates the innovation and
    competition among lexical items in cases of language contact. It is inspired by relatively
    recent historical cases in which the linguistic ecology and sociohistorical context are highly complex. Pidgin and creole genesis offers an opportunity to obtain linguistic facts, social dynamics, and historical demography in a highly segregated society. This provides a solid ground for researching the interaction of populations with different pre-existing language systems, and how different factors contribute to the genesis of the lexicon of a newly generated mixed language. We take into consideration the population dynamics and structures, as well as a distribution of word frequencies related to language use, in order to study how social factors may affect the developmental trajectory of languages. Focusing on the case of Sranan in Suriname, our study shows that it is possible to account for the
    composition of its core lexicon in relation to different social groups, contact patterns, and
    large population movements.
  • Cheung, C.-Y., Kirby, S., & Raviv, L. (2024). The role of gender, social bias and personality traits in shaping linguistic accommodation: An experimental approach. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 80-82). Nijmegen: The Evolution of Language Conferences. doi:10.17617/2.3587960.
  • Cho, T. (2003). Lexical stress, phrasal accent and prosodic boundaries in the realization of domain-initial stops in Dutch. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhs 2003) (pp. 2657-2660). Adelaide: Causal Productions.

    Abstract

    This study examines the effects of prosodic boundaries, lexical stress, and phrasal accent on the acoustic realization of stops (/t, d/) in Dutch, with special attention paid to language-specificity in the phonetics-prosody interface. The results obtained from various acoustic measures show systematic phonetic variations in the production of /t d/ as a function of prosodic position, which may be interpreted as being due to prosodicallyconditioned articulatory strengthening. Shorter VOTs were found for the voiceless stop /t/ in prosodically stronger locations (as opposed to longer VOTs in this position in English). The results suggest that prosodically-driven phonetic realization is bounded by a language-specific phonological feature system.
  • Cos, F., Bujok, R., & Bosker, H. R. (2024). Test-retest reliability of audiovisual lexical stress perception after >1.5 years. In Y. Chen, A. Chen, & A. Arvaniti (Eds.), Proceedings of Speech Prosody 2024 (pp. 871-875). doi:10.21437/SpeechProsody.2024-176.

    Abstract

    In natural communication, we typically both see and hear our conversation partner. Speech comprehension thus requires the integration of auditory and visual information from the speech signal. This is for instance evidenced by the Manual McGurk effect, where the perception of lexical stress is biased towards the syllable that has a beat gesture aligned to it. However, there is considerable individual variation in how heavily gestural timing is weighed as a cue to stress. To assess within-individualconsistency, this study investigated the test-retest reliability of the Manual McGurk effect. We reran an earlier Manual McGurk experiment with the same participants, over 1.5 years later. At the group level, we successfully replicated the Manual McGurk effect with a similar effect size. However, a correlation of the by-participant effect sizes in the two identical experiments indicated that there was only a weak correlation between both tests, suggesting that the weighing of gestural information in the perception of lexical stress is stable at the group level, but less so in individuals. Findings are discussed in comparison to other measures of audiovisual integration in speech perception. Index Terms: Audiovisual integration, beat gestures, lexical stress, test-retest reliability
  • Crasborn, O., Sloetjes, H., Auer, E., & Wittenburg, P. (2006). Combining video and numeric data in the analysis of sign languages with the ELAN annotation software. In C. Vetoori (Ed.), Proceedings of the 2nd Workshop on the Representation and Processing of Sign languages: Lexicographic matters and didactic scenarios (pp. 82-87). Paris: ELRA.

    Abstract

    This paper describes hardware and software that can be used for the phonetic study of sign languages. The field of sign language phonetics is characterised, and the hardware that is currently in use is described. The paper focuses on the software that was developed to enable the recording of finger and hand movement data, and the additions to the ELAN annotation software that facilitate the further visualisation and analysis of the data.
  • Cutler, A., Murty, L., & Otake, T. (2003). Rhythmic similarity effects in non-native listening? In Proceedings of the 15th International Congress of Phonetic Sciences (PCPhS 2003) (pp. 329-332). Adelaide: Causal Productions.

    Abstract

    Listeners rely on native-language rhythm in segmenting speech; in different languages, stress-, syllable- or mora-based rhythm is exploited. This language-specificity affects listening to non- native speech, if native procedures are applied even though inefficient for the non-native language. However, speakers of two languages with similar rhythmic interpretation should segment their own and the other language similarly. This was observed to date only for related languages (English-Dutch; French-Spanish). We now report experiments in which Japanese listeners heard Telugu, a Dravidian language unrelated to Japanese, and Telugu listeners heard Japanese. In both cases detection of target sequences in speech was harder when target boundaries mismatched mora boundaries, exactly the pattern that Japanese listeners earlier exhibited with Japanese and other languages. These results suggest that Telugu and Japanese listeners use similar procedures in segmenting speech, and support the idea that languages fall into rhythmic classes, with aspects of phonological structure affecting listeners' speech segmentation.
  • Cutler, A., Kim, J., & Otake, T. (2006). On the limits of L1 influence on non-L1 listening: Evidence from Japanese perception of Korean. In P. Warren, & C. I. Watson (Eds.), Proceedings of the 11th Australian International Conference on Speech Science & Technology (pp. 106-111).

    Abstract

    Language-specific procedures which are efficient for listening to the L1 may be applied to non-native spoken input, often to the detriment of successful listening. However, such misapplications of L1-based listening do not always happen. We propose, based on the results from two experiments in which Japanese listeners detected target sequences in spoken Korean, that an L1 procedure is only triggered if requisite L1 features are present in the input.
  • Cutler, A., & Fear, B. D. (1991). Categoricality in acceptability judgements for strong versus weak vowels. In J. Llisterri (Ed.), Proceedings of the ESCA Workshop on Phonetics and Phonology of Speaking Styles (pp. 18.1-18.5). Barcelona, Catalonia: Universitat Autonoma de Barcelona.

    Abstract

    A distinction between strong and weak vowels can be drawn on the basis of vowel quality, of stress, or of both factors. An experiment was conducted in which sets of contextually matched word-intial vowels ranging from clearly strong to clearly weak were cross-spliced, and the naturalness of the resulting words was rated by listeners. The ratings showed that in general cross-spliced words were only significantly less acceptable than unspliced words when schwa was not involved; this supports a categorical distinction based on vowel quality.
  • Cutler, A., & Pasveer, D. (2006). Explaining cross-linguistic differences in effects of lexical stress on spoken-word recognition. In R. Hoffmann, & H. Mixdorff (Eds.), Speech Prosody 2006. Dresden: TUD press.

    Abstract

    Experiments have revealed differences across languages in listeners’ use of stress information in recognising spoken words. Previous comparisons of the vocabulary of Spanish and English had suggested that the explanation of this asymmetry might lie in the extent to which considering stress in spokenword recognition allows rejection of unwanted competition from words embedded in other words. This hypothesis was tested on the vocabularies of Dutch and German, for which word recognition results resemble those from Spanish more than those from English. The vocabulary statistics likewise revealed that in each language, the reduction of embeddings resulting from taking stress into account is more similar to the reduction achieved in Spanish than in English.
  • Cutler, A., Eisner, F., McQueen, J. M., & Norris, D. (2006). Coping with speaker-related variation via abstract phonemic categories. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 31-32).
  • Cutler, A., & Butterfield, S. (1989). Natural speech cues to word segmentation under difficult listening conditions. In J. Tubach, & J. Mariani (Eds.), Proceedings of Eurospeech 89: European Conference on Speech Communication and Technology: Vol. 2 (pp. 372-375). Edinburgh: CEP Consultants.

    Abstract

    One of a listener's major tasks in understanding continuous speech is segmenting the speech signal into separate words. When listening conditions are difficult, speakers can help listeners by deliberately speaking more clearly. In three experiments, we examined how word boundaries are produced in deliberately clear speech. We found that speakers do indeed attempt to mark word boundaries; moreover, they differentiate between word boundaries in a way which suggests they are sensitive to listener needs. Application of heuristic segmentation strategies makes word boundaries before strong syllables easiest for listeners to perceive; but under difficult listening conditions speakers pay more attention to marking word boundaries before weak syllables, i.e. they mark those boundaries which are otherwise particularly hard to perceive.
  • Cutler, A., & Chen, H.-C. (1995). Phonological similarity effects in Cantonese word recognition. In K. Elenius, & P. Branderud (Eds.), Proceedings of the Thirteenth International Congress of Phonetic Sciences: Vol. 1 (pp. 106-109). Stockholm: Stockholm University.

    Abstract

    Two lexical decision experiments in Cantonese are described in which the recognition of spoken target words as a function of phonological similarity to a preceding prime is investigated. Phonological similaritv in first syllables produced inhibition, while similarity in second syllables led to facilitation. Differences between syllables in tonal and segmental structure had generally similar effects.
  • Cutler, A. (1991). Prosody in situations of communication: Salience and segmentation. In Proceedings of the Twelfth International Congress of Phonetic Sciences: Vol. 1 (pp. 264-270). Aix-en-Provence: Université de Provence, Service des publications.

    Abstract

    Speakers and listeners have a shared goal: to communicate. The processes of speech perception and of speech production interact in many ways under the constraints of this communicative goal; such interaction is as characteristic of prosodic processing as of the processing of other aspects of linguistic structure. Two of the major uses of prosodic information in situations of communication are to encode salience and segmentation, and these themes unite the contributions to the symposium introduced by the present review.
  • Cutler, A. (1995). Universal and Language-Specific in the Development of Speech. Biology International, (Special Issue 33).
  • Dang, A., Raviv, L., & Galke, L. (2024). Testing the linguistic niche hypothesis in large with a multilingual Wug test. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 91-93). Nijmegen: The Evolution of Language Conferences.
  • Declerck, T., Cunningham, H., Saggion, H., Kuper, J., Reidsma, D., & Wittenburg, P. (2003). MUMIS - Advanced information extraction for multimedia indexing and searching digital media - Processing for multimedia interactive services. 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 553-556.
  • Dediu, D. (2006). Mostly out of Africa, but what did the others have to say? In A. Cangelosi, A. D. Smith, & K. Smith (Eds.), The evolution of language: proceedings of the 6th International Conference (EVOLANG6) (pp. 59-66). World Scientific.

    Abstract

    The Recent Out-of-Africa human evolutionary model seems to be generally accepted. This impression is very prevalent outside palaeoanthropological circles (including studies of language evolution), but proves to be unwarranted. This paper offers a short review of the main challenges facing ROA and concludes that alternative models based on the concept of metapopulation must be also considered. The implications of such a model for language evolution and diversity are briefly reviewed.
  • Dimitriadis, A., Kemps-Snijders, M., Wittenburg, P., Everaert, M., & Levinson, S. C. (2006). Towards a linguist's workbench supporting eScience methods. In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing.
  • Dingemanse, M., Liesenfeld, A., & Woensdregt, M. (2022). Convergent cultural evolution of continuers (mhmm). In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 160-167). Nijmegen: Joint Conference on Language Evolution (JCoLE). doi:10.31234/osf.io/65c79.

    Abstract

    Continuers —words like mm, mmhm, uhum and the like— are among the most frequent types of responses in conversation. They play a key role in joint action coordination by showing positive evidence of understanding and scaffolding narrative delivery. Here we investigate the hypothesis that their functional importance along with their conversational ecology places selective pressures on their form and may lead to cross-linguistic similarities through convergent cultural evolution. We compare continuer tokens in linguistically diverse conversational corpora and find languages make available highly similar forms. We then approach the causal mechanism of convergent cultural evolution using exemplar modelling, simulating the process by which a combination of effort minimization and functional specialization may push continuers to a particular region of phonological possibility space. By combining comparative linguistics and computational modelling we shed new light on the question of how language structure is shaped by and for social interaction.
  • Dingemanse, M., & Liesenfeld, A. (2022). From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022) (pp. 5614 -5633). Dublin, Ireland: Association for Computational Linguistics.

    Abstract

    Informal social interaction is the primordial home of human language. Linguistically diverse conversational corpora are an important and largely untapped resource for computational linguistics and language technology. Through the efforts of a worldwide language documentation movement, such corpora are increasingly becoming available. We show how interactional data from 63 languages (26 families) harbours insights about turn-taking, timing, sequential structure and social action, with implications for language technology, natural language understanding, and the design of conversational interfaces. Harnessing linguistically diverse conversational corpora will provide the empirical foundations for flexible, localizable, humane language technologies of the future.
  • Doherty, M., & Klein, W. (Eds.). (1991). Übersetzung [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (84).
  • Dona, L., & Schouwstra, M. (2022). The Role of Structural Priming, Semantics and Population Structure in Word Order Conventionalization: A Computational Model. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 171-173). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Dona, L., & Schouwstra, M. (2024). Balancing regularization and variation: The roles of priming and motivatedness. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 130-133). Nijmegen: The Evolution of Language Conferences.
  • Drude, S. (2003). Advanced glossing: A language documentation format and its implementation with Shoebox. In Proceedings of the 2002 International Conference on Language Resources and Evaluation (LREC 2002). Paris: ELRA.

    Abstract

    This paper presents Advanced Glossing, a proposal for a general glossing format designed for language documentation, and a specific setup for the Shoebox-program that implements Advanced Glossing to a large extent. Advanced Glossing (AG) goes beyond the traditional Interlinear Morphemic Translation, keeping syntactic and morphological information apart from each other in separate glossing tables. AG provides specific lines for different kinds of annotation – phonetic, phonological, orthographical, prosodic, categorial, structural, relational, and semantic, and it allows for gradual and successive, incomplete, and partial filling in case that some information may be irrelevant, unknown or uncertain. The implementation of AG in Shoebox sets up several databases. Each documented text is represented as a file of syntactic glossings. The morphological glossings are kept in a separate database. As an additional feature interaction with lexical databases is possible. The implementation makes use of the interlinearizing automatism provided by Shoebox, thus obtaining the table format for the alignment of lines in cells, and for semi-automatic filling-in of information in glossing tables which has been extracted from databases
  • Drude, S. (2003). Digitizing and annotating texts and field recordings in the Awetí project. In Proceedings of the EMELD Language Digitization Project Conference 2003. Workshop on Digitizing and Annotating Text and Field Recordings, LSA Institute, Michigan State University, July 11th -13th.

    Abstract

    Digitizing and annotating texts and field recordings Given that several initiatives worldwide currently explore the new field of documentation of endangered languages, the E-MELD project proposes to survey and unite procedures, techniques and results in order to achieve its main goal, ''the formulation and promulgation of best practice in linguistic markup of texts and lexicons''. In this context, this year's workshop deals with the processing of recorded texts. I assume the most valuable contribution I could make to the workshop is to show the procedures and methods used in the Awetí Language Documentation Project. The procedures applied in the Awetí Project are not necessarily representative of all the projects in the DOBES program, and they may very well fall short in several respects of being best practice, but I hope they might provide a good and concrete starting point for comparison, criticism and further discussion. The procedures to be exposed include: * taping with digital devices, * digitizing (preliminarily in the field, later definitely by the TIDEL-team at the Max Planck Institute in Nijmegen), * segmenting and transcribing, using the transcriber computer program, * translating (on paper, or while transcribing), * adding more specific annotation, using the Shoebox program, * converting the annotation to the ELAN-format developed by the TIDEL-team, and doing annotation with ELAN. Focus will be on the different types of annotation. Especially, I will present, justify and discuss Advanced Glossing, a text annotation format developed by H.-H. Lieb and myself designed for language documentation. It will be shown how Advanced Glossing can be applied using the Shoebox program. The Shoebox setup used in the Awetí Project will be shown in greater detail, including lexical databases and semi-automatic interaction between different database types (jumping, interlinearization). ( Freie Universität Berlin and Museu Paraense Emílio Goeldi, with funding from the Volkswagen Foundation.)
  • Duffield, N., & Matsuo, A. (2003). Factoring out the parallelism effect in ellipsis: An interactional approach? In J. Chilar, A. Franklin, D. Keizer, & I. Kimbara (Eds.), Proceedings of the 39th Annual Meeting of the Chicago Linguistic Society (CLS) (pp. 591-603). Chicago: Chicago Linguistics Society.

    Abstract

    Traditionally, there have been three standard assumptions made about the Parallelism Effect on VP-ellipsis, namely that the effect is categorical, that it applies asymmetrically and that it is uniquely due to syntactic factors. Based on the results of a series of experiments involving online and offline tasks, it will be argued that the Parallelism Effect is instead noncategorical and interactional. The factors investigated include construction type, conceptual and morpho-syntactic recoverability, finiteness and anaphor type (to test VP-anaphora). The results show that parallelism is gradient rather than categorical, effects both VP-ellipsis and anaphora, and is influenced by both structural and non-structural factors.
  • Enfield, N. J. (2006). Social consequences of common ground. In N. J. Enfield, & S. C. Levinson (Eds.), Roots of human sociality: Culture, cognition and interaction (pp. 399-430). Oxford: Berg.
  • Enfield, N. J., & Levinson, S. C. (2006). Introduction: Human sociality as a new interdisciplinary field. In N. J. Enfield, & S. C. Levinson (Eds.), Roots of human sociality: Culture, cognition and interaction (pp. 1-35). Oxford: Berg.
  • Fletcher, J., Kidd, E., Stoakes, H., & Nordlinger, R. (2022). Prosodic phrasing, pitch range, and word order variation in Murrinhpatha. In R. Billington (Ed.), Proceedings of the 18th Australasian International Conference on Speech Science and Technology (pp. 201-205). Canberra: Australasian Speech Science and Technology Association.

    Abstract

    Like many Indigenous Australian languages, Murrinhpatha has flexible word order with no apparent configurational syntax. We analyzed an experimental corpus of Murrinhpatha utterances for associations between different thematic role orders, intonational phrasing patterns and pitch downtrends. We found that initial constituents (Agents or Patients) tend to carry the highest pitch targets (HiF0), followed by patterns of downstep and declination. Sentence-final verbs always have lower Hif0 values than either initial or medial Agents or Patients. Thematic role order does not influence intonational
    patterns, with the results suggesting that Murrinhpatha has positional prosody, although final nominals can disrupt global
    pitch downtrends regardless of thematic role.
  • Floyd, S. (2006). The cash value of style in the Andean market. In E.-X. Lee, K. M. Markman, V. Newdick, & T. Sakuma (Eds.), SALSA 13: Texas Linguistic Forum vol. 49. Austin, TX: Texas Linguistics Forum.

    Abstract

    This paper examines code and style shifting during sales transactions based on two market case studies from highland Ecuador. Bringing together ideas of linguistic economy with work on stylistic variation and ethnohistorical research on Andean markets, I study bartering, market calls and sales pitches to show how sellers create stylistic performances distinguished by contrasts of code, register and poetic features. The interaction of the symbolic value of language with the economic values of the market presents a place to examine the relationship between discourse and the material world.
  • Furman, R., Ozyurek, A., & Allen, S. E. M. (2006). Learning to express causal events across languages: What do speech and gesture patterns reveal? In D. Bamman, T. Magnitskaia, & C. Zaller (Eds.), Proceedings of the 30th Annual Boston University Conference on Language Development (pp. 190-201). Somerville, Mass: Cascadilla Press.
  • Galke, L., & Scherp, A. (2022). Bag-of-words vs. graph vs. sequence in text classification: Questioning the necessity of text-graphs and the surprising strength of a wide MLP. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (pp. 4038-4051). Dublin: Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.279.
  • Galke, L., Cuber, I., Meyer, C., Nölscher, H. F., Sonderecker, A., & Scherp, A. (2022). General cross-architecture distillation of pretrained language models into matrix embedding. In Proceedings of the IEEE Joint Conference on Neural Networks (IJCNN 2022), part of the IEEE World Congress on Computational Intelligence (WCCI 2022). doi:10.1109/IJCNN55064.2022.9892144.

    Abstract

    Large pretrained language models (PreLMs) are rev-olutionizing natural language processing across all benchmarks. However, their sheer size is prohibitive for small laboratories or for deployment on mobile devices. Approaches like pruning and distillation reduce the model size but typically retain the same model architecture. In contrast, we explore distilling PreLMs into a different, more efficient architecture, Continual Multiplication of Words (CMOW), which embeds each word as a matrix and uses matrix multiplication to encode sequences. We extend the CMOW architecture and its CMOW/CBOW-Hybrid variant with a bidirectional component for more expressive power, per-token representations for a general (task-agnostic) distillation during pretraining, and a two-sequence encoding scheme that facilitates downstream tasks on sentence pairs, such as sentence similarity and natural language inference. Our matrix-based bidirectional CMOW/CBOW-Hybrid model is competitive to DistilBERT on question similarity and recognizing textual entailment, but uses only half of the number of parameters and is three times faster in terms of inference speed. We match or exceed the scores of ELMo for all tasks of the GLUE benchmark except for the sentiment analysis task SST-2 and the linguistic acceptability task CoLA. However, compared to previous cross-architecture distillation approaches, we demonstrate a doubling of the scores on detecting linguistic acceptability. This shows that matrix-based embeddings can be used to distill large PreLM into competitive models and motivates further research in this direction.
  • Galke, L., Ram, Y., & Raviv, L. (2024). Learning pressures and inductive biases in emergent communication: Parallels between humans and deep neural networks. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 197-201). Nijmegen: The Evolution of Language Conferences.
  • Gamba, M., De Gregorio, C., Valente, D., Raimondi, T., Torti, V., Miaretsoa, L., Carugati, F., Friard, O., Giacoma, C., & Ravignani, A. (2022). Primate rhythmic categories analyzed on an individual basis. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 229-236). Nijmegen: Joint Conference on Language Evolution (JCoLE).

    Abstract

    Rhythm is a fundamental feature characterizing communicative displays, and recent studies showed that primate songs encompass categorical rhythms falling on small integer ratios observed in humans. We individually assessed the presence and sexual dimorphism of rhythmic categories, analyzing songs emitted by 39 wild indris. Considering the intervals between the units given during each song, we extracted 13556 interval ratios and found three peaks (at around 0.33, 0.47, and 0.70). Two peaks indicated rhythmic categories corresponding to small integer ratios (1:1, 2:1). All individuals showed a peak at 0.70, and
    most showed those at 0.47 and 0.33. In addition, we found sex differences in the peak at 0.47 only, with males showing lower values than females. This work investigates the presence of individual rhythmic categories in a non-human species; further research may highlight the significance of rhythmicity and untie selective pressures that guided its evolution across species, including humans.
  • Gazendam, L., Malaisé, V., Schreiber, G., & Brugman, H. (2006). Deriving semantic annotations of an audiovisual program from contextual texts. In First International Workshop on Semantic Web Annotations for Multimedia (SWAMM 2006).

    Abstract

    The aim of this paper is to explore whether indexing terms for an audiovisual program can be derived from contextual texts automatically. For this we apply natural-language processing techniques to contextual texts of two Dutch TV-programs. We use a Dutch domain thesaurus to derive possible metadata. This possible metadata is ranked by an algorithm which uses the relations of the thesaurus. We evaluate the results by comparing them to human made descriptions.
  • Ghaleb, E., Rasenberg, M., Pouw, W., Toni, I., Holler, J., Özyürek, A., & Fernandez, R. (2024). Analysing cross-speaker convergence through the lens of automatically detected shared linguistic constructions. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (Eds.), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 1717-1723).

    Abstract

    Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions---expressions with a common lexical core used by both speakers within a dialogue---and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.

    Additional information

    link to eScholarship
  • Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Ozyurek, A., & Fernandez, R. (2024). Cospeech gesture detection through multi-phase sequence labeling. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024) (pp. 4007-4015).

    Abstract

    Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and re-
    traction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework’s capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis.
  • Goudbeek, M., & Swingley, D. (2006). Saliency effects in distributional learning. In Proceedings of the 11th Australasian International Conference on Speech Science and Technology (pp. 478-482). Auckland: Australasian Speech Science and Technology Association.

    Abstract

    Acquiring the sounds of a language involves learning to recognize distributional patterns present in the input. We show that among adult learners, this distributional learning of auditory categories (which are conceived of here as probability density functions in a multidimensional space) is constrained by the salience of the dimensions that form the axes of this perceptual space. Only with a particular ratio of variation in the perceptual dimensions was category learning driven by the distributional properties of the input.
  • Grosseck, O., Perlman, M., Ortega, G., & Raviv, L. (2024). The iconic affordances of gesture and vocalization in emerging languages in the lab. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 223-225). Nijmegen: The Evolution of Language Conferences.
  • Gullberg, M., & Indefrey, P. (Eds.). (2006). The cognitive neuroscience of second language acquisition [Special Issue]. Language Learning, 56(suppl. 1).
  • Gullberg, M. (Ed.). (2006). Gestures and second language acquisition [Special Issue]. International Review of Applied Linguistics, 44(2).
  • Harbusch, K., & Kempen, G. (2006). ELLEIPO: A module that computes coordinative ellipsis for language generators that don't. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2006) (pp. 115-118).

    Abstract

    Many current sentence generators lack the ability to compute elliptical versions of coordinated clauses in accordance with the rules for Gapping, Forward and Backward Conjunction Reduction, and SGF (Subject Gap in clauses with Finite/ Fronted verb). We describe a module (implemented in JAVA, with German and Dutch as target languages) that takes non-elliptical coordinated clauses as input and returns all reduced versions licensed by coordinative ellipsis. It is loosely based on a new psycholinguistic theory of coordinative ellipsis proposed by Kempen. In this theory, coordinative ellipsis is not supposed to result from the application of declarative grammar rules for clause formation but from a procedural component that interacts with the sentence generator and may block the overt expression of certain constituents.
  • Harbusch, K., Kempen, G., Van Breugel, C., & Koch, U. (2006). A generation-oriented workbench for performance grammar: Capturing linear order variability in German and Dutch. In Proceedings of the 4th International Natural Language Generation Conference (pp. 9-11).

    Abstract

    We describe a generation-oriented workbench for the Performance Grammar (PG) formalism, highlighting the treatment of certain word order and movement constraints in Dutch and German. PG enables a simple and uniform treatment of a heterogeneous collection of linear order phenomena in the domain of verb constructions (variably known as Cross-serial Dependencies, Verb Raising, Clause Union, Extraposition, Third Construction, Particle Hopping, etc.). The central data structures enabling this feature are clausal “topologies”: one-dimensional arrays associated with clauses, whose cells (“slots”) provide landing sites for the constituents of the clause. Movement operations are enabled by unification of lateral slots of topologies at adjacent levels of the clause hierarchy. The PGW generator assists the grammar developer in testing whether the implemented syntactic knowledge allows all and only the well-formed permutations of constituents.
  • Herbst, L. E. (2006). The influence of language dominance on bilingual VOT: A case study. In Proceedings of the 4th University of Cambridge Postgraduate Conference on Language Research (CamLing 2006) (pp. 91-98). Cambridge: Cambridge University Press.

    Abstract

    Longitudinally collected VOT data from an early English-Italian bilingual who became increasingly English-dominant was analyzed. Stops in English were always produced with significantly longer VOT than in Italian. However, the speaker did not show any significant change in the VOT production in either language over time, despite the clear dominance of English in his every day language use later in his life. The results indicate that – unlike L2 learners – early bilinguals may remain unaffected by language use with respect to phonetic realization.
  • Hintz, F., Voeten, C. C., McQueen, J. M., & Meyer, A. S. (2022). Quantifying the relationships between linguistic experience, general cognitive skills and linguistic processing skills. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (Eds.), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 2491-2496). Toronto, Canada: Cognitive Science Society.

    Abstract

    Humans differ greatly in their ability to use language. Contemporary psycholinguistic theories assume that individual differences in language skills arise from variability in linguistic experience and in general cognitive skills. While much previous research has tested the involvement of select verbal and non-verbal variables in select domains of linguistic processing, comprehensive characterizations of the relationships among the skills underlying language use are rare. We contribute to such a research program by re-analyzing a publicly available set of data from 112 young adults tested on 35 behavioral tests. The tests assessed nine key constructs reflecting linguistic processing skills, linguistic experience and general cognitive skills. Correlation and hierarchical clustering analyses of the test scores showed that most of the tests assumed to measure the same construct correlated moderately to strongly and largely clustered together. Furthermore, the results suggest important roles of processing speed in comprehension, and of linguistic experience in production.
  • Hintz, F., & Meyer, A. S. (Eds.). (2024). Individual differences in language skills [Special Issue]. Journal of Cognition, 7(1).
  • Hoeksema, N., Hagoort, P., & Vernes, S. C. (2022). Piecing together the building blocks of the vocal learning bat brain. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 294-296). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Holler, J., & Stevens, R. (2006). How speakers represent size information in referential communication for knowing and unknowing recipients. In D. Schlangen, & R. Fernandez (Eds.), Brandial '06 Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue, Potsdam, Germany, September 11-13.
  • Janse, E. (2003). Word perception in natural-fast and artificially time-compressed speech. In M. SolÉ, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of the Phonetic Sciences (pp. 3001-3004).
  • Johnson, E. K. (2003). Speaker intent influences infants' segmentation of potentially ambiguous utterances. In Proceedings of the 15th International Congress of Phonetic Sciences (PCPhS 2003) (pp. 1995-1998). Adelaide: Causal Productions.
  • Joshi, A., Mohanty, R., Kanakanti, M., Mangla, A., Choudhary, S., Barbate, M., & Modi, A. (2024). iSign: A benchmark for Indian Sign Language processing. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Findings of the Association for Computational Linguistics ACL 2024 (pp. 10827-10844). Bangkok, Thailand: Association for Computational Linguistics.

    Abstract

    Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose iSign: a benchmark for Indian Sign Language (ISL) Processing. We make three primary contributions to this work. First, we release one of the largest ISL-English datasets with more than video-sentence/phrase pairs. To the best of our knowledge, it is the largest sign language dataset available for ISL. Second, we propose multiple NLP-specific tasks (including SignVideo2Text, SignPose2Text, Text2Pose, Word Prediction, and Sign Semantics) and benchmark them with the baseline models for easier access to the research community. Third, we provide detailed insights into the proposed benchmarks with a few linguistic insights into the working of ISL. We streamline the evaluation of Sign Language processing, addressing the gaps in the NLP research community for Sign Languages. We release the dataset, tasks and models via the following website: https://exploration-lab.github.io/iSign/

    Additional information

    dataset, tasks, models
  • Josserand, M., Pellegrino, F., Grosseck, O., Dediu, D., & Raviv, L. (2024). Adapting to individual differences: An experimental study of variation in language evolution. In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 286-289). Nijmegen: The Evolution of Language Conferences.
  • Kan, U., Gökgöz, K., Sumer, B., Tamyürek, E., & Özyürek, A. (2022). Emergence of negation in a Turkish homesign system: Insights from the family context. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (Eds.), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 387-389). Nijmegen: Joint Conference on Language Evolution (JCoLE).
  • Kempen, G., & Harbusch, K. (2003). A corpus study into word order variation in German subordinate clauses: Animacy affects linearization independently of function assignment. In Proceedings of AMLaP 2003 (pp. 153-154). Glasgow: Glasgow University.
  • Kempen, G. (1988). De netwerker: Spin in het web of rat in een doolhof? In SURF in theorie en praktijk: Van personal tot supercomputer (pp. 59-61). Amsterdam: Elsevier Science Publishers.
  • Kemps-Snijders, M., Ducret, J., Romary, L., & Wittenburg, P. (2006). An API for accessing the data category registry. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 2299-2302).
  • Kemps-Snijders, M., Nederhof, M.-J., & Wittenburg, P. (2006). LEXUS, a web-based tool for manipulating lexical resources. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1862-1865).
  • Klassmann, A., Offenga, F., Broeder, D., Skiba, R., & Wittenburg, P. (2006). Comparison of resource discovery methods. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 113-116).
  • Klein, W. (1995). A simplest analysis of the English tense-aspect system. In W. Riehle, & H. Keiper (Eds.), Proceedings of the Anglistentag 1994 (pp. 139-151). Tübingen: Niemeyer.
  • Klein, W. (Ed.). (1995). Epoche [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (100).
  • Klein, W., & Franceschini, R. (Eds.). (2003). Einfache Sprache [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 131.
  • Klein, W. (Ed.). (1989). Kindersprache [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (73).
  • Klein, W. (Ed.). (1988). Sprache Kranker [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (69).
  • Kohatsu, T., Akamine, S., Sato, M., & Niikuni, K. (2022). Individual differences in empathy affect perspective adoption in language comprehension. In Proceedings of the 39th Annual Meeting of Japanese Cognitive Science Society (pp. 652-656). Tokyo: Japanese Cognitive Science Society.
  • Kuzla, C., Mitterer, H., Ernestus, M., & Cutler, A. (2006). Perceptual compensation for voice assimilation of German fricatives. In P. Warren, & I. Watson (Eds.), Proceedings of the 11th Australasian International Conference on Speech Science and Technology (pp. 394-399).

    Abstract

    In German, word-initial lax fricatives may be produced with substantially reduced glottal vibration after voiceless obstruents. This assimilation occurs more frequently and to a larger extent across prosodic word boundaries than across phrase boundaries. Assimilatory devoicing makes the fricatives more similar to their tense counterparts and could thus hinder word recognition. The present study investigates how listeners cope with assimilatory devoicing. Results of a cross-modal priming experiment indicate that listeners compensate for assimilation in appropriate contexts. Prosodic structure moderates compensation for assimilation: Compensation occurs especially after phrase boundaries, where devoiced fricatives are sufficiently long to be confused with their tense counterparts.
  • Kuzla, C., Ernestus, M., & Mitterer, H. (2006). Prosodic structure affects the production and perception of voice-assimilated German fricatives. In R. Hoffmann, & H. Mixdorff (Eds.), Speech prosody 2006. Dresden: TUD Press.

    Abstract

    Prosodic structure has long been known to constrain phonological processes [1]. More recently, it has also been recognized as a source of fine-grained phonetic variation of speech sounds. In particular, segments in domain-initial position undergo prosodic strengthening [2, 3], which also implies more resistance to coarticulation in higher prosodic domains [5]. The present study investigates the combined effects of prosodic strengthening and assimilatory devoicing on word-initial fricatives in German, the functional implication of both processes for cues to the fortis-lenis contrast, and the influence of prosodic structure on listeners’ compensation for assimilation. Results indicate that 1. Prosodic structure modulates duration and the degree of assimilatory devoicing, 2. Phonological contrasts are maintained by speakers, but differ in phonetic detail across prosodic domains, and 3. Compensation for assimilation in perception is moderated by prosodic structure and lexical constraints.
  • Kuzla, C. (2003). Prosodically-conditioned variation in the realization of domain-final stops and voicing assimilation of domain-initial fricatives in German. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2829-2832). Adelaide: Causal Productions.
  • Kuzla, C., Mitterer, H., & Ernestus, M. (2006). Compensation for assimilatory devoicing and prosodic structure in German fricative perception. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 43-44).
  • Lammertink, I., De Heer Kloots, M., Bazioni, M., & Raviv, L. (2024). Learnability effects in children: Are more structured languages easier to learn? In J. Nölle, L. Raviv, K. E. Graham, S. Hartmann, Y. Jadoul, M. Josserand, T. Matzinger, K. Mudd, M. Pleyer, A. Slonimska, & S. Wacewicz (Eds.), The Evolution of Language: Proceedings of the 15th International Conference (EVOLANG XV) (pp. 320-323). Nijmegen: The Evolution of Language Conferences.
  • De Lange, F. P., Hagoort, P., & Toni, I. (2003). Differential fronto-parietal contributions to visual and motor imagery. NeuroImage, 19(2), e2094-e2095.

    Abstract

    Mental imagery is a cognitive process crucial to human reasoning. Numerous studies have characterized specific
    instances of this cognitive ability, as evoked by visual imagery (VI) or motor imagery (MI) tasks. However, it
    remains unclear which neural resources are shared between VI and MI, and which are exclusively related to MI.
    To address this issue, we have used fMRI to measure human brain activity during performance of VI and MI
    tasks. Crucially, we have modulated the imagery process by manipulating the degree of mental rotation necessary
    to solve the tasks. We focused our analysis on changes in neural signal as a function of the degree of mental
    rotation in each task.
  • Levelt, C. C., Fikkert, P., & Schiller, N. O. (2003). Metrical priming in speech production. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2481-2485). Adelaide: Causal Productions.

    Abstract

    In this paper we report on four experiments in which we attempted to prime the stress position of Dutch bisyllabic target nouns. These nouns, picture names, had stress on either the first or the second syllable. Auditory prime words had either the same stress as the target or a different stress (e.g., WORtel – MOtor vs. koSTUUM – MOtor; capital letters indicate stressed syllables in prime – target pairs). Furthermore, half of the prime words were semantically related, the other half were unrelated. In none of the experiments a stress priming effect was found. This could mean that stress is not stored in the lexicon. An additional finding was that targets with initial stress had a faster response than targets with a final stress. We hypothesize that bisyllabic words with final stress take longer to be encoded because this stress pattern is irregular with respect to the lexical distribution of bisyllabic stress patterns, even though it can be regular in terms of the metrical stress rules of Dutch.
  • Levelt, W. J. M. (1991). Lexical access in speech production: Stages versus cascading. In H. Peters, W. Hulstijn, & C. Starkweather (Eds.), Speech motor control and stuttering (pp. 3-10). Amsterdam: Excerpta Medica.
  • Levinson, S. C. (2006). On the human "interaction engine". In N. J. Enfield, & S. C. Levinson (Eds.), Roots of human sociality: Culture, cognition and interaction (pp. 39-69). Oxford: Berg.
  • Liesenfeld, A., & Dingemanse, M. (2022). Bottom-up discovery of structure and variation in response tokens (‘backchannels’) across diverse languages. In Proceedings of Interspeech 2022 (pp. 1126-1130).

    Abstract

    Response tokens (also known as backchannels, continuers, or feedback) are a frequent feature of human interaction, where they serve to display understanding and streamline turn-taking. We propose a bottom-up method to study responsive behaviour across 16 languages (8 language families). We use sequential context and recurrence of turns formats to identify candidate response tokens in a language-agnostic way across diverse conversational corpora. We then use UMAP clustering directly on speech signals to represent structure and variation. We find that (i) written orthographic annotations underrepresent the attested variation, (ii) distinctions between formats can be gradient rather than discrete, (iii) most languages appear to make available a broad distinction between a minimal nasal format `mm' and a fuller `yeah’-like format. Charting this aspect of human interaction contributes to our understanding of interactional infrastructure across languages and can inform the design of speech technologies.
  • Liesenfeld, A., & Dingemanse, M. (2022). Building and curating conversational corpora for diversity-aware language science and technology. In F. Béchet, P. Blache, K. Choukri, C. Cieri, T. DeClerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, & J. Odijk (Eds.), Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022) (pp. 1178-1192). Marseille, France: European Language Resources Association.

    Abstract

    We present an analysis pipeline and best practice guidelines for building and curating corpora of everyday conversation in diverse languages. Surveying language documentation corpora and other resources that cover 67 languages and varieties from 28 phyla, we describe the compilation and curation process, specify minimal properties of a unified format for interactional data, and develop methods for quality control that take into account turn-taking and timing. Two case studies show the broad utility of conversational data for (i) charting human interactional infrastructure and (ii) tracing challenges and opportunities for current ASR solutions. Linguistically diverse conversational corpora can provide new insights for the language sciences and stronger empirical foundations for language technology.
  • Liesenfeld, A., & Dingemanse, M. (2024). Rethinking open source generative AI: open-washing and the EU AI Act. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24) (pp. 1774-1784). ACM.

    Abstract

    The past year has seen a steep rise in generative AI systems that claim to be open. But how open are they really? The question of what counts as open source in generative AI is poised to take on particular importance in light of the upcoming EU AI Act that regulates open source systems differently, creating an urgent need for practical openness assessment. Here we use an evidence-based framework that distinguishes 14 dimensions of openness, from training datasets to scientific and technical documentation and from licensing to access methods. Surveying over 45 generative AI systems (both text and text-to-image), we find that while the term open source is widely used, many models are `open weight' at best and many providers seek to evade scientific, legal and regulatory scrutiny by withholding information on training and fine-tuning data. We argue that openness in generative AI is necessarily composite (consisting of multiple elements) and gradient (coming in degrees), and point out the risk of relying on single features like access or licensing to declare models open or not. Evidence-based openness assessment can help foster a generative AI landscape in which models can be effectively regulated, model providers can be held accountable, scientists can scrutinise generative AI, and end users can make informed decisions.

Share this page