Publications

Displaying 101 - 200 of 208
  • Kidd, E., Bavin, E. L., & Rhodes, B. (2001). Two-year-olds' knowledge of verbs and argument structures. In M. Almgren, A. Barreña, M.-J. Ezeuzabarrena, I. Idiazabal, & B. MacWhinney (Eds.), Research on child language acquisition: Proceedings of the 8th Conference of the International Association for the Study of Child language (pp. 1368-1382). Sommerville: Cascadilla Press.
  • Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In Gesture and Sign-Language in Human-Computer Interaction (Lecture Notes in Artificial Intelligence - LNCS Subseries, Vol. 1371) (pp. 23-35). Berlin, Germany: Springer-Verlag.

    Abstract

    The previous literature has suggested that the hand movement in co-speech gestures and signs consists of a series of phases with qualitatively different dynamic characteristics. In this paper, we propose a syntagmatic rule system for movement phases that applies to both co-speech gestures and signs. Descriptive criteria for the rule system were developed for the analysis video-recorded continuous production of signs and gesture. It involves segmenting a stream of body movement into phases and identifying different phase types. Two human coders used the criteria to analyze signs and cospeech gestures that are produced in natural discourse. It was found that the criteria yielded good inter-coder reliability. These criteria can be used for the technology of automatic recognition of signs and co-speech gestures in order to segment continuous production and identify the potentially meaningbearing phase.
  • Klein, W. (2000). Changing concepts of the nature-nurture debate. In R. Hide, J. Mittelstrass, & W. Singer (Eds.), Changing concepts of nature at the turn of the millenium: Proceedings plenary session of the Pontifical academy of sciences, 26-29 October 1998 (pp. 289-299). Vatican City: Pontificia Academia Scientiarum.
  • Klein, W., & Schnell, R. (Eds.). (2008). Literaturwissenschaft und Linguistik [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (150).
  • Klein, W. (Ed.). (1983). Intonation [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (49).
  • Klein, W. (Ed.). (2008). Ist Schönheit messbar? [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 152.
  • Klein, W. (Ed.). (1998). Kaleidoskop [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (112).
  • Klein, W. (Ed.). (1997). Technologischer Wandel in den Philologien [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (106).
  • Klein, W. (Ed.). (2000). Sprache des Rechts [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (118).
  • Klein, W. (Ed.). (1982). Zweitspracherwerb [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (45).
  • Koster, M., & Cutler, A. (1997). Segmental and suprasegmental contributions to spoken-word recognition in Dutch. In Proceedings of EUROSPEECH 97 (pp. 2167-2170). Grenoble, France: ESCA.

    Abstract

    Words can be distinguished by segmental differences or by suprasegmental differences or both. Studies from English suggest that suprasegmentals play little role in human spoken-word recognition; English stress, however, is nearly always unambiguously coded in segmental structure (vowel quality); this relationship is less close in Dutch. The present study directly compared the effects of segmental and suprasegmental mispronunciation on word recognition in Dutch. There was a strong effect of suprasegmental mispronunciation, suggesting that Dutch listeners do exploit suprasegmental information in word recognition. Previous findings indicating the effects of mis-stressing for Dutch differ with stress position were replicated only when segmental change was involved, suggesting that this is an effect of segmental rather than suprasegmental processing.
  • Kreuzer, H. (Ed.). (1971). Methodische Perspektiven [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (1/2).
  • Lansner, A., Sandberg, A., Petersson, K. M., & Ingvar, M. (2000). On forgetful attractor network memories. In H. Malmgren, M. Borga, & L. Niklasson (Eds.), Artificial neural networks in medicine and biology: Proceedings of the ANNIMAB-1 Conference, Göteborg, Sweden, 13-16 May 2000 (pp. 54-62). Heidelberg: Springer Verlag.

    Abstract

    A recurrently connected attractor neural network with a Hebbian learning rule is currently our best ANN analogy for a piece cortex. Functionally biological memory operates on a spectrum of time scales with regard to induction and retention, and it is modulated in complex ways by sub-cortical neuromodulatory systems. Moreover, biological memory networks are commonly believed to be highly distributed and engage many co-operating cortical areas. Here we focus on the temporal aspects of induction and retention of memory in a connectionist type attractor memory model of a piece of cortex. A continuous time, forgetful Bayesian-Hebbian learning rule is described and compared to the characteristics of LTP and LTD seen experimentally. More generally, an attractor network implementing this learning rule can operate as a long-term, intermediate-term, or short-term memory. Modulation of the print-now signal of the learning rule replicates some experimental memory phenomena, like e.g. the von Restorff effect.
  • Lausberg, H., & Kita, S. (2001). Hemispheric specialization in nonverbal gesticulation investigated in patients with callosal disconnection. In C. Cavé, I. Guaïtella, & S. Santi (Eds.), Oralité et gestualité: Interactions et comportements multimodaux dans la communication. Actes du colloque ORAGE 2001 (pp. 266-270). Paris, France: Éditions L'Harmattan.
  • Lenkiewicz, P., Auer, E., Schreer, O., Masneri, S., Schneider, D., & Tschöpe, S. (2012). AVATecH ― automated annotation through audio and video analysis. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 209-214). European Language Resources Association.

    Abstract

    In different fields of the humanities annotations of multimodal resources are a necessary component of the research workflow. Examples include linguistics, psychology, anthropology, etc. However, creation of those annotations is a very laborious task, which can take 50 to 100 times the length of the annotated media, or more. This can be significantly improved by applying innovative audio and video processing algorithms, which analyze the recordings and provide automated annotations. This is the aim of the AVATecH project, which is a collaboration of the Max Planck Institute for Psycholinguistics (MPI) and the Fraunhofer institutes HHI and IAIS. In this paper we present a set of results of automated annotation together with an evaluation of their quality.
  • Lenkiewicz, P., Pereira, M., Freire, M., & Fernandes, J. (2008). Accelerating 3D medical image segmentation with high performance computing. In Proceedings of the IEEE International Workshops on Image Processing Theory, Tools and Applications - IPT (pp. 1-8).

    Abstract

    Digital processing of medical images has helped physicians and patients during past years by allowing examination and diagnosis on a very precise level. Nowadays possibly the biggest deal of support it can offer for modern healthcare is the use of high performance computing architectures to treat the huge amounts of data that can be collected by modern acquisition devices. This paper presents a parallel processing implementation of an image segmentation algorithm that operates on a computer cluster equipped with 10 processing units. Thanks to well-organized distribution of the workload we manage to significantly shorten the execution time of the developed algorithm and reach a performance gain very close to linear.
  • Lenkiewicz, A., Lis, M., & Lenkiewicz, P. (2012). Linguistic concepts described with Media Query Language for automated annotation. In J. C. Meiser (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 477-479).

    Abstract

    Introduction Human spoken communication is multimodal, i.e. it encompasses both speech and gesture. Acoustic properties of voice, body movements, facial expression, etc. are an inherent and meaningful part of spoken interaction; they can provide attitudinal, grammatical and semantic information. In the recent years interest in audio-visual corpora has been rising rapidly as they enable investigation of different communicative modalities and provide more holistic view on communication (Kipp et al. 2009). Moreover, for some languages such corpora are the only available resource, as is the case for endangered languages for which no written resources exist.
  • Lenkiewicz, P., Van Uytvanck, D., Wittenburg, P., & Drude, S. (2012). Towards automated annotation of audio and video recordings by application of advanced web-services. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 1880-1883).

    Abstract

    In this paper we describe audio and video processing algorithms that are developed in the scope of AVATecH project. The purpose of these algorithms is to shorten the time taken by manual annotation of audio and video recordings by extracting features from media files and creating semi-automated annotations. We show that the use of such supporting algorithms can shorten the annotation time to 30-50% of the time necessary to perform a fully manual annotation of the same kind.
  • Levelt, W. J. M. (1983). The speaker's organization of discourse. In Proceedings of the XIIIth International Congress of Linguists (pp. 278-290).
  • Levinson, S. C. (2000). Language as nature and language as art. In J. Mittelstrass, & W. Singer (Eds.), Proceedings of the Symposium on ‘Changing concepts of nature and the turn of the Millennium (pp. 257-287). Vatican City: Pontificae Academiae Scientiarium Scripta Varia.
  • Levinson, S. C. (2000). H.P. Grice on location on Rossel Island. In S. S. Chang, L. Liaw, & J. Ruppenhofer (Eds.), Proceedings of the 25th Annual Meeting of the Berkeley Linguistic Society (pp. 210-224). Berkeley: Berkeley Linguistic Society.
  • Lucas, C., Griffiths, T., Xu, F., & Fawcett, C. (2008). A rational model of preference learning and choice prediction by children. In D. Koller, Y. Bengio, D. Schuurmans, L. Bottou, & A. Culotta (Eds.), Advances in Neural Information Processing Systems.

    Abstract

    Young children demonstrate the ability to make inferences about the preferences of other agents based on their choices. However, there exists no overarching account of what children are doing when they learn about preferences or how they use that knowledge. We use a rational model of preference learning, drawing on ideas from economics and computer science, to explain the behavior of children in several recent experiments. Specifically, we show how a simple econometric model can be extended to capture two- to four-year-olds’ use of statistical information in inferring preferences, and their generalization of these preferences.
  • Magyari, L., & De Ruiter, J. P. (2008). Timing in conversation: The anticipation of turn endings. In J. Ginzburg, P. Healey, & Y. Sato (Eds.), Proceedings of the 12th Workshop on the Semantics and Pragmatics Dialogue (pp. 139-146). London: King's college.

    Abstract

    We examined how communicators can switch between speaker and listener role with such accurate timing. During conversations, the majority of role transitions happens with a gap or overlap of only a few hundred milliseconds. This suggests that listeners can predict when the turn of the current speaker is going to end. Our hypothesis is that listeners know when a turn ends because they know how it ends. Anticipating the last words of a turn can help the next speaker in predicting when the turn will end, and also in anticipating the content of the turn, so that an appropriate response can be prepared in advance. We used the stimuli material of an earlier experiment (De Ruiter, Mitterer & Enfield, 2006), in which subjects were listening to turns from natural conversations and had to press a button exactly when the turn they were listening to ended. In the present experiment, we investigated if the subjects can complete those turns when only an initial fragment of the turn is presented to them. We found that the subjects made better predictions about the last words of those turns that had more accurate responses in the earlier button press experiment.
  • Majid, A. (2012). Taste in twenty cultures [Abstract]. Abstracts from the XXIth Congress of European Chemoreception Research Organization, ECRO-2011. Publ. in Chemical Senses, 37(3), A10.

    Abstract

    Scholars disagree about the extent to which language can tell us
    about conceptualisation of the world. Some believe that language
    is a direct window onto concepts: Having a word ‘‘bird’’, ‘‘table’’ or
    ‘‘sour’’ presupposes the corresponding underlying concept, BIRD,
    TABLE, SOUR. Others disagree. Words are thought to be uninformative,
    or worse, misleading about our underlying conceptual representations;
    after all, our mental worlds are full of ideas that we
    struggle to express in language. How could this be so, argue sceptics,
    if language were a direct window on our inner life? In this presentation,
    I consider what language can tell us about the
    conceptualisation of taste. By considering linguistic data from
    twenty unrelated cultures – varying in subsistence mode (huntergatherer
    to industrial), ecological zone (rainforest jungle to desert),
    dwelling type (rural and urban), and so forth – I argue any single language is, indeed, impoverished about what it can reveal about
    taste. But recurrent lexicalisation patterns across languages can
    provide valuable insights about human taste experience. Moreover,
    language patterning is part of the data that a good theory of taste
    perception has to be answerable for. Taste researchers, therefore,
    cannot ignore the crosslinguistic facts.
  • Majid, A., Boroditsky, L., & Gaby, A. (Eds.). (2012). Time in terms of space [Research topic] [Special Issue]. Frontiers in cultural psychology. Retrieved from http://www.frontiersin.org/cultural_psychology/researchtopics/Time_in_terms_of_space/755.

    Abstract

    This Research Topic explores the question: what is the relationship between representations of time and space in cultures around the world? This question touches on the broader issue of how humans come to represent and reason about abstract entities – things we cannot see or touch. Time is a particularly opportune domain to investigate this topic. Across cultures, people use spatial representations for time, for example in graphs, time-lines, clocks, sundials, hourglasses, and calendars. In language, time is also heavily related to space, with spatial terms often used to describe the order and duration of events. In English, for example, we might move a meeting forward, push a deadline back, attend a long concert or go on a short break. People also make consistent spatial gestures when talking about time, and appear to spontaneously invoke spatial representations when processing temporal language. A large body of evidence suggests a close correspondence between temporal and spatial language and thought. However, the ways that people spatialize time can differ dramatically across languages and cultures. This research topic identifies and explores some of the sources of this variation, including patterns in spatial thinking, patterns in metaphor, gesture and other cultural systems. This Research Topic explores how speakers of different languages talk about time and space and how they think about these domains, outside of language. The Research Topic invites papers exploring the following issues: 1. Do the linguistic representations of space and time share the same lexical and morphosyntactic resources? 2. To what extent does the conceptualization of time follow the conceptualization of space?
  • McCafferty, S. G., & Gullberg, M. (Eds.). (2008). Gesture and SLA: Toward an integrated approach [Special Issue]. Studies in Second Language Acquisition, 30(2).
  • McQueen, J. M., Norris, D., & Cutler, A. (2001). Can lexical knowledge modulate prelexical representations over time? In R. Smits, J. Kingston, T. Neary, & R. Zondervan (Eds.), Proceedings of the workshop on Speech Recognition as Pattern Classification (SPRAAC) (pp. 145-150). Nijmegen: Max Planck Institute for Psycholinguistics.

    Abstract

    The results of a study on perceptual learning are reported. Dutch subjects made lexical decisions on a list of words and nonwords. Embedded in the list were either [f]- or [s]-final words in which the final fricative had been replaced by an ambiguous sound, midway between [f] and [s]. One group of listeners heard ambiguous [f]- final Dutch words like [kara?] (based on karaf, carafe) and unambiguous [s]-final words (e.g., karkas, carcase). A second group heard the reverse (e.g., ambiguous [karka?] and unambiguous karaf). After this training phase, listeners labelled ambiguous fricatives on an [f]- [s] continuum. The subjects who had heard [?] in [f]- final words categorised these fricatives as [f] reliably more often than those who had heard [?] in [s]-final words. These results suggest that speech recognition is dynamic: the system adjusts to the constraints of each particular listening situation. The lexicon can provide this adjustment process with a training signal.
  • McQueen, J. M., & Cutler, A. (1998). Spotting (different kinds of) words in (different kinds of) context. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2791-2794). Sydney: ICSLP.

    Abstract

    The results of a word-spotting experiment are presented in which Dutch listeners tried to spot different types of bisyllabic Dutch words embedded in different types of nonsense contexts. Embedded verbs were not reliably harder to spot than embedded nouns; this suggests that nouns and verbs are recognised via the same basic processes. Iambic words were no harder to spot than trochaic words, suggesting that trochaic words are not in principle easier to recognise than iambic words. Words were harder to spot in consonantal contexts (i.e., contexts which themselves could not be words) than in longer contexts which contained at least one vowel (i.e., contexts which, though not words, were possible words of Dutch). A control experiment showed that this difference was not due to acoustic differences between the words in each context. The results support the claim that spoken-word recognition is sensitive to the viability of sound sequences as possible words.
  • McQueen, J. M., Cutler, A., & Norris, D. (2000). Positive and negative influences of the lexicon on phonemic decision-making. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the Sixth International Conference on Spoken Language Processing: Vol. 3 (pp. 778-781). Beijing: China Military Friendship Publish.

    Abstract

    Lexical knowledge influences how human listeners make decisions about speech sounds. Positive lexical effects (faster responses to target sounds in words than in nonwords) are robust across several laboratory tasks, while negative effects (slower responses to targets in more word-like nonwords than in less word-like nonwords) have been found in phonetic decision tasks but not phoneme monitoring tasks. The present experiments tested whether negative lexical effects are therefore a task-specific consequence of the forced choice required in phonetic decision. We compared phoneme monitoring and phonetic decision performance using the same Dutch materials in each task. In both experiments there were positive lexical effects, but no negative lexical effects. We observe that in all studies showing negative lexical effects, the materials were made by cross-splicing, which meant that they contained perceptual evidence supporting the lexically-consistent phonemes. Lexical knowledge seems to influence phonemic decision-making only when there is evidence for the lexically-consistent phoneme in the speech signal.
  • McQueen, J. M., Cutler, A., & Norris, D. (2000). Why Merge really is autonomous and parsimonious. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP (Workshop on Spoken Word Access Processes) (pp. 47-50). Nijmegen: Max-Planck-Institute for Psycholinguistics.

    Abstract

    We briefly describe the Merge model of phonemic decision-making, and, in the light of general arguments about the possible role of feedback in spoken-word recognition, defend Merge's feedforward structure. Merge not only accounts adequately for the data, without invoking feedback connections, but does so in a parsimonious manner.
  • Mitterer, H. (Ed.). (2012). Ecological aspects of speech perception [Research topic] [Special Issue]. Frontiers in Cognition.

    Abstract

    Our knowledge of speech perception is largely based on experiments conducted with carefully recorded clear speech presented under good listening conditions to undistracted listeners - a near-ideal situation, in other words. But the reality poses a set of different challenges. First of all, listeners may need to divide their attention between speech comprehension and another task (e.g., driving). Outside the laboratory, the speech signal is often slurred by less than careful pronunciation and the listener has to deal with background noise. Moreover, in a globalized world, listeners need to understand speech in more than their native language. Relatedly, the speakers we listen to often have a different language background so we have to deal with a foreign or regional accent we are not familiar with. Finally, outside the laboratory, speech perception is not an end in itself, but rather a mean to contribute to a conversation. Listeners do not only need to understand the speech they are hearing, they also need to use this information to plan and time their own responses. For this special topic, we invite papers that address any of these ecological aspects of speech perception.
  • Mitterer, H. (2008). How are words reduced in spontaneous speech? In A. Botonis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (pp. 165-168). Athens: University of Athens.

    Abstract

    Words are reduced in spontaneous speech. If reductions are constrained by functional (i.e., perception and production) constraints, they should not be arbitrary. This hypothesis was tested by examing the pronunciations of high- to mid-frequency words in a Dutch and a German spontaneous speech corpus. In logistic-regression models the "reduction likelihood" of a phoneme was predicted by fixed-effect predictors such as position within the word, word length, word frequency, and stress, as well as random effects such as phoneme identity and word. The models for Dutch and German show many communalities. This is in line with the assumption that similar functional constraints influence reductions in both languages.
  • Moore, R. K., & Cutler, A. (2001). Constraints on theories of human vs. machine recognition of speech. In R. Smits, J. Kingston, T. Neary, & R. Zondervan (Eds.), Proceedings of the workshop on Speech Recognition as Pattern Classification (SPRAAC) (pp. 145-150). Nijmegen: Max Planck Institute for Psycholinguistics.

    Abstract

    The central issues in the study of speech recognition by human listeners (HSR) and of automatic speech recognition (ASR) are clearly comparable; nevertheless the research communities that concern themselves with ASR and HSR are largely distinct. This paper compares the research objectives of the two fields, and attempts to draw informative lessons from one to the other.
  • Namjoshi, J., Tremblay, A., Broersma, M., Kim, S., & Cho, T. (2012). Influence of recent linguistic exposure on the segmentation of an unfamiliar language [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1968.

    Abstract

    Studies have shown that listeners segmenting unfamiliar languages transfer native-language (L1) segmentation cues. These studies, however, conflated L1 and recent linguistic exposure. The present study investigates the relative influences of L1 and recent linguistic exposure on the use of prosodic cues for segmenting an artificial language (AL). Participants were L1-French listeners, high-proficiency L2-French L1-English listeners, and L1-English listeners without functional knowledge of French. The prosodic cue assessed was F0 rise, which is word-final in French, but in English tends to be word-initial. 30 participants heard a 20-minute AL speech stream with word-final boundaries marked by F0 rise, and decided in a subsequent listening task which of two words (without word-final F0 rise) had been heard in the speech stream. The analyses revealed a marginally significant effect of L1 (all listeners) and, importantly, a significant effect of recent linguistic exposure (L1-French and L2-French listeners): accuracy increased with decreasing time in the US since the listeners’ last significant (3+ months) stay in a French-speaking environment. Interestingly, no effect of L2 proficiency was found (L2-French listeners).
  • Nordhoff, S., & Hammarström, H. (2012). Glottolog/Langdoc: Increasing the visibility of grey literature for low-density languages. In N. Calzolari (Ed.), Proceedings of the 8th International Conference on Language Resources and Evaluation [LREC 2012], May 23-25, 2012 (pp. 3289-3294). [Paris]: ELRA.

    Abstract

    Language resources can be divided into structural resources treating phonology, morphosyntax, semantics etc. and resources treating the social, demographic, ethnic, political context. A third type are meta-resources, like bibliographies, which provide access to the resources of the first two kinds. This poster will present the Glottolog/Langdoc project, a comprehensive bibliography providing web access to 180k bibliographical records to (mainly) low visibility resources from low-density languages. The resources are annotated for macro-area, content language, and document type and are available in XHTML and RDF.
  • Norris, D., Cutler, A., McQueen, J. M., Butterfield, S., & Kearns, R. K. (2000). Language-universal constraints on the segmentation of English. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP (Workshop on Spoken Word Access Processes) (pp. 43-46). Nijmegen: Max-Planck-Institute for Psycholinguistics.

    Abstract

    Two word-spotting experiments are reported that examine whether the Possible-Word Constraint (PWC) [1] is a language-specific or language-universal strategy for the segmentation of continuous speech. The PWC disfavours parses which leave an impossible residue between the end of a candidate word and a known boundary. The experiments examined cases where the residue was either a CV syllable with a lax vowel, or a CVC syllable with a schwa. Although neither syllable context is a possible word in English, word-spotting in both contexts was easier than with a context consisting of a single consonant. The PWC appears to be language-universal rather than language-specific.
  • Norris, D., Cutler, A., & McQueen, J. M. (2000). The optimal architecture for simulating spoken-word recognition. In C. Davis, T. Van Gelder, & R. Wales (Eds.), Cognitive Science in Australia, 2000: Proceedings of the Fifth Biennial Conference of the Australasian Cognitive Science Society. Adelaide: Causal Productions.

    Abstract

    Simulations explored the inability of the TRACE model of spoken-word recognition to model the effects on human listening of subcategorical mismatch in word forms. The source of TRACE's failure lay not in interactive connectivity, not in the presence of inter-word competition, and not in the use of phonemic representations, but in the need for continuously optimised interpretation of the input. When an analogue of TRACE was allowed to cycle to asymptote on every slice of input, an acceptable simulation of the subcategorical mismatch data was achieved. Even then, however, the simulation was not as close as that produced by the Merge model, which has inter-word competition, phonemic representations and continuous optimisation (but no interactive connectivity).
  • Otake, T., & Cutler, A. (2000). A set of Japanese word cohorts rated for relative familiarity. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the Sixth International Conference on Spoken Language Processing: Vol. 3 (pp. 766-769). Beijing: China Military Friendship Publish.

    Abstract

    A database is presented of relative familiarity ratings for 24 sets of Japanese words, each set comprising words overlapping in the initial portions. These ratings are useful for the generation of material sets for research in the recognition of spoken words.
  • Otake, T., & Cutler, A. (2001). Recognition of (almost) spoken words: Evidence from word play in Japanese. In P. Dalsgaard (Ed.), Proceedings of EUROSPEECH 2001 (pp. 465-468).

    Abstract

    Current models of spoken-word recognition assume automatic activation of multiple candidate words fully or partially compatible with the speech input. We propose that listeners make use of this concurrent activation in word play such as punning. Distortion in punning should ideally involve no more than a minimal contrastive deviation between two words, namely a phoneme. Moreover, we propose that this metric of similarity does not presuppose phonemic awareness on the part of the punster. We support these claims with an analysis of modern and traditional puns in Japanese (in which phonemic awareness in language users is not encouraged by alphabetic orthography). For both data sets, the results support the predictions. Punning draws on basic processes of spokenword recognition, common across languages.
  • Ozturk, O., & Papafragou, A. (2008). Acquisition of evidentiality and source monitoring. In H. Chan, H. Jacob, & E. Kapia (Eds.), Proceedings from the 32nd Annual Boston University Conference on Language Development [BUCLD 32] (pp. 368-377). Somerville, Mass.: Cascadilla Press.
  • Ozyurek, A. (1998). An analysis of the basic meaning of Turkish demonstratives in face-to-face conversational interaction. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (Eds.), Oralite et gestualite: Communication multimodale, interaction: actes du colloque ORAGE 98 (pp. 609-614). Paris: L'Harmattan.
  • Ozyurek, A., & Ozcaliskan, S. (2000). How do children learn to conflate manner and path in their speech and gestures? Differences in English and Turkish. In E. V. Clark (Ed.), The proceedings of the Thirtieth Child Language Research Forum (pp. 77-85). Stanford: CSLI Publications.
  • Ozyurek, A. (2001). What do speech-gesture mismatches reveal about language specific processing? A comparison of Turkish and English. In C. Cavé, I. Guaitella, & S. Santi (Eds.), Oralité et gestualité: Interactions et comportements multimodaux dans la communication: Actes du Colloque ORAGE 2001 (pp. 567-581). Paris: L'Harmattan.
  • Pallier, C., Cutler, A., & Sebastian-Galles, N. (1997). Prosodic structure and phonetic processing: A cross-linguistic study. In Proceedings of EUROSPEECH 97 (pp. 2131-2134). Grenoble, France: ESCA.

    Abstract

    Dutch and Spanish differ in how predictable the stress pattern is as a function of the segmental content: it is correlated with syllable weight in Dutch but not in Spanish. In the present study, two experiments were run to compare the abilities of Dutch and Spanish speakers to separately process segmental and stress information. It was predicted that the Spanish speakers would have more difficulty focusing on the segments and ignoring the stress pattern than the Dutch speakers. The task was a speeded classification task on CVCV syllables, with blocks of trials in which the stress pattern could vary versus blocks in which it was fixed. First, we found interference due to stress variability in both languages, suggesting that the processing of segmental information cannot be performed independently of stress. Second, the effect was larger for Spanish than for Dutch, suggesting that that the degree of interference from stress variation may be partially mitigated by the predictability of stress placement in the language.
  • Petersson, K. M. (2008). On cognition, structured sequence processing, and adaptive dynamical systems. American Institute of Physics Conference Proceedings, 1060(1), 195-200.

    Abstract

    Cognitive neuroscience approaches the brain as a cognitive system: a system that functionally is conceptualized in terms of information processing. We outline some aspects of this concept and consider a physical system to be an information processing device when a subclass of its physical states can be viewed as representational/cognitive and transitions between these can be conceptualized as a process operating on these states by implementing operations on the corresponding representational structures. We identify a generic and fundamental problem in cognition: sequentially organized structured processing. Structured sequence processing provides the brain, in an essential sense, with its processing logic. In an approach addressing this problem, we illustrate how to integrate levels of analysis within a framework of adaptive dynamical systems. We note that the dynamical system framework lends itself to a description of asynchronous event-driven devices, which is likely to be important in cognition because the brain appears to be an asynchronous processing system. We use the human language faculty and natural language processing as a concrete example through out.
  • Poellmann, K., McQueen, J. M., & Mitterer, H. (2012). How talker-adaptation helps listeners recognize reduced word-forms [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 2053.

    Abstract

    Two eye-tracking experiments tested whether native listeners can adapt
    to reductions in casual Dutch speech. Listeners were exposed to segmental
    ([b] > [m]), syllabic (full-vowel-deletion), or no reductions. In a subsequent
    test phase, all three listener groups were tested on how efficiently they could
    recognize both types of reduced words. In the first Experiment’s exposure
    phase, the (un)reduced target words were predictable. The segmental reductions
    were completely consistent (i.e., involved the same input sequences).
    Learning about them was found to be pattern-specific and generalized in the
    test phase to new reduced /b/-words. The syllabic reductions were not consistent
    (i.e., involved variable input sequences). Learning about them was
    weak and not pattern-specific. Experiment 2 examined effects of word repetition
    and predictability. The (un-)reduced test words appeared in the exposure
    phase and were not predictable. There was no evidence of learning for
    the segmental reductions, probably because they were not predictable during
    exposure. But there was word-specific learning for the vowel-deleted words.
    The results suggest that learning about reductions is pattern-specific and
    generalizes to new words if the input is consistent and predictable. With
    variable input, there is more likely to be adaptation to a general speaking
    style and word-specific learning.
  • Ravignani, A., & Fitch, W. T. (2012). Sonification of experimental parameters as a new method for efficient coding of behavior. In A. Spink, F. Grieco, O. E. Krips, L. W. S. Loijens, L. P. P. J. Noldus, & P. H. Zimmerman (Eds.), Measuring Behavior 2012, 8th International Conference on Methods and Techniques in Behavioral Research (pp. 376-379).

    Abstract

    Cognitive research is often focused on experimental condition-driven reactions. Ethological studies frequently
    rely on the observation of naturally occurring specific behaviors. In both cases, subjects are filmed during the
    study, so that afterwards behaviors can be coded on video. Coding should typically be blind to experimental
    conditions, but often requires more information than that present on video. We introduce a method for blindcoding
    of behavioral videos that takes care of both issues via three main innovations. First, of particular
    significance for playback studies, it allows creation of a “soundtrack” of the study, that is, a track composed of
    synthesized sounds representing different aspects of the experimental conditions, or other events, over time.
    Second, it facilitates coding behavior using this audio track, together with the possibly muted original video.
    This enables coding blindly to conditions as required, but not ignoring other relevant events. Third, our method
    makes use of freely available, multi-platform software, including scripts we developed.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). The strength of stress-related lexical competition depends on the presence of first-syllable stress. In Proceedings of Interspeech 2008 (pp. 1954-1954).

    Abstract

    Dutch listeners' looks to printed words were tracked while they listened to instructions to click with their mouse on one of them. When presented with targets from word pairs where the first two syllables were segmentally identical but differed in stress location, listeners used stress information to recognize the target before segmental information disambiguated the words. Furthermore, the amount of lexical competition was influenced by the presence or absence of word-initial stress.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). Lexical stress information modulates the time-course of spoken-word recognition. In Proceedings of Acoustics' 08 (pp. 3183-3188).

    Abstract

    Segmental as well as suprasegmental information is used by Dutch listeners to recognize words. The time-course of the effect of suprasegmental stress information on spoken-word recognition was investigated in a previous study, in which we tracked Dutch listeners' looks to arrays of four printed words as they listened to spoken sentences. Each target was displayed along with a competitor that did not differ segmentally in its first two syllables but differed in stress placement (e.g., 'CENtimeter' and 'sentiMENT'). The listeners' eye-movements showed that stress information is used to recognize the target before distinct segmental information is available. Here, we examine the role of durational information in this effect. Two experiments showed that initial-syllable duration, as a cue to lexical stress, is not interpreted dependent on the speaking rate of the preceding carrier sentence. This still held when other stress cues like pitch and amplitude were removed. Rather, the speaking rate of the preceding carrier affected the speed of word recognition globally, even though the rate of the target itself was not altered. Stress information modulated lexical competition, but did so independently of the rate of the preceding carrier, even if duration was the only stress cue present.
  • Roberts, L., & Meyer, A. S. (Eds.). (2012). Individual differences in second language acquisition [Special Issue]. Language Learning, 62(Supplement S2).
  • Robotham, L., Trinkler, I., & Sauter, D. (2008). The power of positives: Evidence for an overall emotional recognition deficit in Huntington's disease [Abstract]. Journal of Neurology, Neurosurgery & Psychiatry, 79, A12.

    Abstract

    The recognition of emotions of disgust, anger and fear have been shown to be significantly impaired in Huntington’s disease (eg,Sprengelmeyer et al, 1997, 2006; Gray et al, 1997; Milders et al, 2003,Montagne et al, 2006; Johnson et al, 2007; De Gelder et al, 2008). The relative impairment of these emotions might have implied a recognition impairment specific to negative emotions. Could the asymmetric recognition deficits be due not to the complexity of the emotion but rather reflect the complexity of the task? In the current study, 15 Huntington’s patients and 16 control subjects were presented with negative and positive non-speech emotional vocalisations that were to be identified as anger, fear, sadness, disgust, achievement, pleasure and amusement in a forced-choice paradigm. This experiment more accurately matched the negative emotions with positive emotions in a homogeneous modality. The resulting dually impaired ability of Huntington’s patients to identify negative and positive non-speech emotional vocalisations correctly provides evidence for an overall emotional recognition deficit in the disease. These results indicate that previous findings of a specificity in emotional recognition deficits might instead be due to the limitations of the visual modality. Previous experiments may have found an effect of emotional specificy due to the presence of a single positive emotion, happiness, in the midst of multiple negative emotions. In contrast with the previous literature, the study presented here points to a global deficit in the recognition of emotional sounds.
  • De Ruiter, L. E. (2008). How useful are polynomials for analyzing intonation? In Proceedings of Interspeech 2008 (pp. 785-789).

    Abstract

    This paper presents the first application of polynomial modeling as a means for validating phonological pitch accent labels to German data. It is compared to traditional phonetic analysis (measuring minima, maxima, alignment). The traditional method fares better in classification, but results are comparable in statistical accent pair testing. Robustness tests show that pitch correction is necessary in both cases. The approaches are discussed in terms of their practicability, applicability to other domains of research and interpretability of their results.
  • Sauter, D., Eisner, F., Rosen, S., & Scott, S. K. (2008). The role of source and filter cues in emotion recognition in speech [Abstract]. Journal of the Acoustical Society of America, 123, 3739-3740.

    Abstract

    In the context of the source-filter theory of speech, it is well established that intelligibility is heavily reliant on information carried by the filter, that is, spectral cues (e.g., Faulkner et al., 2001; Shannon et al., 1995). However, the extraction of other types of information in the speech signal, such as emotion and identity, is less well understood. In this study we investigated the extent to which emotion recognition in speech depends on filterdependent cues, using a forced-choice emotion identification task at ten levels of noise-vocoding ranging between one and 32 channels. In addition, participants performed a speech intelligibility task with the same stimuli. Our results indicate that compared to speech intelligibility, emotion recognition relies less on spectral information and more on cues typically signaled by source variations, such as voice pitch, voice quality, and intensity. We suggest that, while the reliance on spectral dynamics is likely a unique aspect of human speech, greater phylogenetic continuity across species may be found in the communication of affect in vocalizations.
  • Sauter, D. (2008). The time-course of emotional voice processing [Abstract]. Neurocase, 14, 455-455.

    Abstract

    Research using event-related brain potentials (ERPs) has demonstrated an early differential effect in fronto-central regions when processing emotional, as compared to affectively neutral facial stimuli (e.g., Eimer & Holmes, 2002). In this talk, data demonstrating a similar effect in the auditory domain will be presented. ERPs were recorded in a one-back task where participants had to identify immediate repetitions of emotion category, such as a fearful sound followed by another fearful sound. The stimulus set consisted of non-verbal emotional vocalisations communicating positive and negative sounds, as well as neutral baseline conditions. Similarly to the facial domain, fear sounds as compared to acoustically controlled neutral sounds, elicited a frontally distributed positivity with an onset latency of about 150 ms after stimulus onset. These data suggest the existence of a rapid multi-modal frontocentral mechanism discriminating emotional from non-emotional human signals.
  • Scharenborg, O., Sturm, J., & Boves, L. (2001). Business listings in automatic directory assistance. In Interspeech - Eurospeech 2001 - 7th European Conference on Speech Communication and Technology (pp. 2381-2384). ISCA Archive.

    Abstract

    So far most attempts to automate Directory Assistance services focused on private listings, because it is not known precisely how callers will refer to a business listings. The research described in this paper, carried out in the SMADA project, tries to fill this gap. The aim of the research is to model the expressions people use when referring to a business listing by means of rules, in order to automatically create a vocabulary, which can be part of an automated DA service. In this paper a rule-base procedure is proposed, which derives rules from the expressions people use. These rules are then used to automatically create expressions from directory listings. Two categories of businesses, viz. hospitals and the hotel and catering industry, are used to explain this procedure. Results for these two categories are used to discuss the problem of the over- and undergeneration of expressions.
  • Scharenborg, O., & Cooke, M. P. (2008). Comparing human and machine recognition performance on a VCV corpus. In ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery".

    Abstract

    Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and without ‘pitch’ information, using the same backend. The results are compared with listener results at the level of articulatory feature classification. While no acoustic feature representation reached the levels of human performance, both MFCCs and Rate maps achieved good scores, with Rate maps nearing human performance on the classification of voicing. Comparing the results on the most difficult articulatory features to classify showed similarities between the humans and the SVMs: e.g., ‘dental’ was by far the least well identified by both groups. Overall, adding pitch information seemed to hamper classification performance.
  • Scharenborg, O., Witteman, M. J., & Weber, A. (2012). Computational modelling of the recognition of foreign-accented speech. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 882 -885).

    Abstract

    In foreign-accented speech, pronunciation typically deviates from the canonical form to some degree. For native listeners, it has been shown that word recognition is more difficult for strongly-accented words than for less strongly-accented words. Furthermore recognition of strongly-accented words becomes easier with additional exposure to the foreign accent. In this paper, listeners’ behaviour was simulated with Fine-tracker, a computational model of word recognition that uses real speech as input. The simulations showed that, in line with human listeners, 1) Fine-Tracker’s recognition outcome is modulated by the degree of accentedness and 2) it improves slightly after brief exposure with the accent. On the level of individual words, however, Fine-tracker failed to correctly simulate listeners’ behaviour, possibly due to differences in overall familiarity with the chosen accent (German-accented Dutch) between human listeners and Fine-Tracker.
  • Scharenborg, O., Bouwman, G., & Boves, L. (2000). Connected digit recognition with class specific word models. In Proceedings of the COST249 Workshop on Voice Operated Telecom Services workshop (pp. 71-74).

    Abstract

    This work focuses on efficient use of the training material by selecting the optimal set of model topologies. We do this by training multiple word models of each word class, based on a subclassification according to a priori knowledge of the training material. We will examine classification criteria with respect to duration of the word, gender of the speaker, position of the word in the utterance, pauses in the vicinity of the word, and combinations of these. Comparative experiments were carried out on a corpus consisting of Dutch spoken connected digit strings and isolated digits, which are recorded in a wide variety of acoustic conditions. The results show, that classification based on gender of the speaker, position of the digit in the string, pauses in the vicinity of the training tokens, and models based on a combination of these criteria perform significantly better than the set with single models per digit.
  • Scharenborg, O. (2008). Modelling fine-phonetic detail in a computational model of word recognition. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1473-1476). ISCA Archive.

    Abstract

    There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal helps listeners to segment a speech signal into syllables and words. In this paper, we compare two computational models of word recognition on their ability to capture and use this finephonetic detail during speech recognition. One model, SpeM, is phoneme-based, whereas the other, newly developed Fine- Tracker, is based on articulatory features. Simulations dealt with modelling the ability of listeners to distinguish short words (e.g., ‘ham’) from the longer words in which they are embedded (e.g., ‘hamster’). The simulations with Fine- Tracker showed that it was, like human listeners, able to distinguish between short words from the longer words in which they are embedded. This suggests that it is possible to extract this fine-phonetic detail from the speech signal and use it during word recognition.
  • Scharenborg, O., & Janse, E. (2012). Hearing loss and the use of acoustic cues in phonetic categorisation of fricatives. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 1458-1461).

    Abstract

    Aging often affects sensitivity to the higher frequencies, which results in the loss of sensitivity to phonetic detail in speech. Hearing loss may therefore interfere with the categorisation of two consonants that have most information to differentiate between them in those higher frequencies and less in the lower frequencies, e.g., /f/ and /s/. We investigate two acoustic cues, i.e., formant transitions and fricative intensity, that older listeners might use to differentiate between /f/ and /s/. The results of two phonetic categorisation tasks on 38 older listeners (aged 60+) with varying degrees of hearing loss indicate that older listeners seem to use formant transitions as a cue to distinguish /s/ from /f/. Moreover, this ability is not impacted by hearing loss. On the other hand, listeners with increased hearing loss seem to rely more on intensity for fricative identification. Thus, progressive hearing loss may lead to gradual changes in perceptual cue weighting.
  • Scharenborg, O., Janse, E., & Weber, A. (2012). Perceptual learning of /f/-/s/ by older listeners. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 398-401).

    Abstract

    Young listeners can quickly modify their interpretation of a speech sound when a talker produces the sound ambiguously. Young Dutch listeners rely mainly on the higher frequencies to distinguish between /f/ and /s/, but these higher frequencies are particularly vulnerable to age-related hearing loss. We therefore tested whether older Dutch listeners can show perceptual retuning given an ambiguous pronunciation in between /f/ and /s/. Results of a lexically-guided perceptual learning experiment showed that older Dutch listeners are still able to learn non-standard pronunciations of /f/ and /s/. Possibly, the older listeners have learned to rely on other acoustic cues, such as formant transitions, to distinguish between /f/ and /s/. However, the size and duration of the perceptual effect is influenced by hearing loss, with listeners with poorer hearing showing a smaller and a shorter-lived learning effect.
  • Schiller, N. O., Van Lieshout, P. H. H. M., Meyer, A. S., & Levelt, W. J. M. (1997). Is the syllable an articulatory unit in speech production? Evidence from an Emma study. In P. Wille (Ed.), Fortschritte der Akustik: Plenarvorträge und Fachbeiträge der 23. Deutschen Jahrestagung für Akustik (DAGA 97) (pp. 605-606). Oldenburg: DEGA.
  • Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., & Sloetjes, H. (2008). An exchange format for multimodal annotations. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation of multimodality. We propose a multimodal annotation exchange format, based on the annotation graph formalism, which is supported by import and export routines in the respective tools
  • Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.

    Abstract

    This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.
  • Scott, D. R., & Cutler, A. (1982). Segmental cues to syntactic structure. In Proceedings of the Institute of Acoustics 'Spectral Analysis and its Use in Underwater Acoustics' (pp. E3.1-E3.4). London: Institute of Acoustics.
  • Senft, G. (2000). COME and GO in Kilivila. In B. Palmer, & P. Geraghty (Eds.), SICOL. Proceedings of the second international conference on Oceanic linguistics: Volume 2, Historical and descriptive studies (pp. 105-136). Canberra: Pacific Linguistics.
  • Seuren, P. A. M. (2001). Lexical meaning and metaphor. In E. N. Enikö (Ed.), Cognition in language use (pp. 422-431). Antwerp, Belgium: International Pragmatics Association (IPrA).
  • Seuren, P. A. M. (1971). Qualche osservazione sulla frase durativa e iterativa in italiano. In M. Medici, & R. Simone (Eds.), Grammatica trasformazionale italiana (pp. 209-224). Roma: Bulzoni.
  • Seuren, P. A. M. (1982). Riorientamenti metodologici nello studio della variabilità linguistica. In D. Gambarara, & A. D'Atri (Eds.), Ideologia, filosofia e linguistica: Atti del Convegno Internazionale di Studi, Rende (CS) 15-17 Settembre 1978 ( (pp. 499-515). Roma: Bulzoni.
  • Seyfeddinipur, M., & Kita, S. (2001). Gestures and dysfluencies in speech. In C. Cavé, I. Guaïtella, & S. Santi (Eds.), Oralité et gestualité: Interactions et comportements multimodaux dans la communication. Actes du colloque ORAGE 2001 (pp. 266-270). Paris, France: Éditions L'Harmattan.
  • Sjerps, M. J., McQueen, J. M., & Mitterer, H. (2012). Extrinsic normalization for vocal tracts depends on the signal, not on attention. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 394-397).

    Abstract

    When perceiving vowels, listeners adjust to speaker-specific vocal-tract characteristics (such as F1) through "extrinsic vowel normalization". This effect is observed as a shift in the location of categorization boundaries of vowel continua. Similar effects have been found with non-speech. Non-speech materials, however, have consistently led to smaller effect-sizes, perhaps because of a lack of attention to non-speech. The present study investigated this possibility. Non-speech materials that had previously been shown to elicit reduced normalization effects were tested again, with the addition of an attention manipulation. The results show that increased attention does not lead to increased normalization effects, suggesting that vowel normalization is mainly determined by bottom-up signal characteristics.
  • Sloetjes, H., & Somasundaram, A. (2012). ELAN development, keeping pace with communities' needs. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 219-223). European Language Resources Association (ELRA).

    Abstract

    ELAN is a versatile multimedia annotation tool that is being developed at the Max Planck Institute for Psycholinguistics. About a decade ago it emerged out of a number of corpus tools and utilities and it has been extended ever since. This paper focuses on the efforts made to ensure that the application keeps up with the growing needs of that era in linguistics and multimodality research; growing needs in terms of length and resolution of recordings, the number of recordings made and transcribed and the number of levels of annotation per transcription.
  • Sloetjes, H., & Wittenburg, P. (2008). Annotation by category - ELAN and ISO DCR. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    The Data Category Registry is one of the ISO initiatives towards the establishment of standards for Language Resource management, creation and coding. Successful application of the DCR depends on the availability of tools that can interact with it. This paper describes the first steps that have been taken to provide users of the multimedia annotation tool ELAN, with the means to create references from tiers and annotations to data categories defined in the ISO Data Category Registry. It first gives a brief description of the capabilities of ELAN and the structure of the documents it creates. After a concise overview of the goals and current state of the ISO DCR infrastructure, a description is given of how the preliminary connectivity with the DCR is implemented in ELAN
  • De Sousa, H. (2008). The development of echo-subject markers in Southern Vanuatu. In T. J. Curnow (Ed.), Selected papers from the 2007 Conference of the Australian Linguistic Society. Australian Linguistic Society.

    Abstract

    One of the defining features of the Southern Vanuatu language family is the echo-subject (ES) marker (Lynch 2001: 177-178). Canonically, an ES marker indicates that the subject of the clause is coreferential with the subject of the preceding clause. This paper begins with a survey of the various ES systems found in Southern Vanuatu. Two prominent differences amongst the ES systems are: a) the level of obligatoriness of the ES marker; and b) the level of grammatical integration between an ES clauses and the preceding clause. The variation found amongst the ES systems reveals a clear path of grammaticalisation from the VP coordinator *ma in Proto–Southern Vanuatu to the various types of ES marker in contemporary Southern Vanuatu languages
  • Stehouwer, H., Durco, M., Auer, E., & Broeder, D. (2012). Federated search: Towards a common search infrastructure. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 3255-3259). European Language Resources Association (ELRA).

    Abstract

    Within scientific institutes there exist many language resources. These resources are often quite specialized and relatively unknown. The current infrastructural initiatives try to tackle this issue by collecting metadata about the resources and establishing centers with stable repositories to ensure the availability of the resources. It would be beneficial if the researcher could, by means of a simple query, determine which resources and which centers contain information useful to his or her research, or even work on a set of distributed resources as a virtual corpus. In this article we propose an architecture for a distributed search environment allowing researchers to perform searches in a set of distributed language resources.
  • Stehouwer, H., & Van den Bosch, A. (2008). Putting the t where it belongs: Solving a confusion problem in Dutch. In S. Verberne, H. Van Halteren, & P.-A. Coppen (Eds.), Computational Linguistics in the Netherlands 2007: Selected Papers from the 18th CLIN Meeting (pp. 21-36). Utrecht: LOT.

    Abstract

    A common Dutch writing error is to confuse a word ending in -d with a neighbor word ending in -dt. In this paper we describe the development of a machine-learning-based disambiguator that can determine which word ending is appropriate, on the basis of its local context. We develop alternative disambiguators, varying between a single monolithic classifier and having multiple confusable experts disambiguate between confusable pairs. Disambiguation accuracy of the best developed disambiguators exceeds 99%; when we apply these disambiguators to an external test set of collected errors, our detection strategy correctly identifies up to 79% of the errors.
  • Sumer, B., Zwitserlood, I., Perniss, P. M., & Ozyurek, A. (2012). Development of locative expressions by Turkish deaf and hearing children: Are there modality effects? In A. K. Biller, E. Y. Chung, & A. E. Kimball (Eds.), Proceedings of the 36th Annual Boston University Conference on Language Development (BUCLD 36) (pp. 568-580). Boston: Cascadilla Press.
  • Svantesson, J.-O., Burenhult, N., Holmer, A., Karlsson, A., & Lundström, H. (Eds.). (2012). Humanities of the lesser-known: New directions in the description, documentation and typology of endangered languages and musics [Special Issue]. Language Documentation and Description, 10.
  • Ten Bosch, L., & Scharenborg, O. (2012). Modeling cue trading in human word recognition. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 2003-2006).

    Abstract

    Classical phonetic studies have shown that acoustic-articulatory cues can be interchanged without affecting the resulting phoneme percept (‘cue trading’). Cue trading has so far mainly been investigated in the context of phoneme identification. In this study, we investigate cue trading in word recognition, because words are the units of speech through which we communicate. This paper aims to provide a method to quantify cue trading effects by using a computational model of human word recognition. This model takes the acoustic signal as input and represents speech using articulatory feature streams. Importantly, it allows cue trading and underspecification. Its set-up is inspired by the functionality of Fine-Tracker, a recent computational model of human word recognition. This approach makes it possible, for the first time, to quantify cue trading in terms of a trade-off between features and to investigate cue trading in the context of a word recognition task.
  • Trilsbeek, P., Broeder, D., Van Valkenhoef, T., & Wittenburg, P. (2008). A grid of regional language archives. In C. Calzolari (Ed.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008) (pp. 1474-1477). European Language Resources Association (ELRA).

    Abstract

    About two years ago, the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, started an initiative to install regional language archives in various places around the world, particularly in places where a large number of endangered languages exist and are being documented. These digital archives make use of the LAT archiving framework [1] that the MPI has developed
    over the past nine years. This framework consists of a number of web-based tools for depositing, organizing and utilizing linguistic resources in a digital archive. The regional archives are in principle autonomous archives, but they can decide to share metadata descriptions and language resources with the MPI archive in Nijmegen and become part of a grid of linked LAT archives. By doing so, they will also take advantage of the long-term preservation strategy of the MPI archive. This paper describes the reasoning
    behind this initiative and how in practice such an archive is set up.
  • Turco, G., & Gubian, M. (2012). L1 Prosodic transfer and priming effects: A quantitative study on semi-spontaneous dialogues. In Q. Ma, H. Ding, & D. Hirst (Eds.), Proceedings of the 6th International Conference on Speech Prosody (pp. 386-389). International Speech Communication Association (ISCA).

    Abstract

    This paper represents a pilot investigation of primed accentuation patterns produced by advanced Dutch speakers of Italian as a second language (L2). Contrastive accent patterns within prepositional phrases were elicited in a semispontaneous dialogue entertained with a confederate native speaker of Italian. The aim of the analysis was to compare learner’s contrastive accentual configurations induced by the confederate speaker’s prime against those produced by Italian and Dutch natives in the same testing conditions. F0 and speech rate data were analysed by applying powerful datadriven techniques available in the Functional Data Analysis statistical framework. Results reveal different accentual configurations in L1 and L2 Italian in response to the confederate’s prime. We conclude that learner’s accentual patterns mirror those ones produced by their L1 control group (prosodic-transfer hypothesis) although the hypothesis of a transient priming effect on learners’ choice of contrastive patterns cannot be completely ruled out.
  • Van Uytvanck, D., Dukers, A., Ringersma, J., & Trilsbeek, P. (2008). Language-sites: Accessing and presenting language resources via geographic information systems. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). Paris: European Language Resources Association (ELRA).

    Abstract

    The emerging area of Geographic Information Systems (GIS) has proven to add an interesting dimension to many research projects. Within the language-sites initiative we have brought together a broad range of links to digital language corpora and resources. Via Google Earth's visually appealing 3D-interface users can spin the globe, zoom into an area they are interested in and access directly the relevant language resources. This paper focuses on several ways of relating the map and the online data (lexica, annotations, multimedia recordings, etc.). Furthermore, we discuss some of the implementation choices that have been made, including future challenges. In addition, we show how scholars (both linguists and anthropologists) are using GIS tools to fulfill their specific research needs by making use of practical examples. This illustrates how both scientists and the general public can benefit from geography-based access to digital language data
  • Van Valin Jr., R. D. (2000). Focus structure or abstract syntax? A role and reference grammar account of some ‘abstract’ syntactic phenomena. In Z. Estrada Fernández, & I. Barreras Aguilar (Eds.), Memorias del V Encuentro Internacional de Lingüística en el Noroeste: (2 v.) Estudios morfosintácticos (pp. 39-62). Hermosillo: Editorial Unison.
  • Van de Weijer, J. (1997). Language input to a prelingual infant. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings of the GALA '97 conference on language acquisition (pp. 290-293). Edinburgh University Press.

    Abstract

    Pitch, intonation, and speech rate were analyzed in a collection of everyday speech heard by one Dutch infant between the ages of six and nine months. Components of each of these variables were measured in the speech of three adult speakers (mother, father, baby-sitter) when they addressed the infant, and when they addressed another adult. The results are in line with previously reported findings which are usually based on laboratory or prearranged settings: infant-directed speech in a natural setting exhibits more pitch variation, a larger number of simple intonation contours, and slower speech rate than does adult-directed speech.
  • Van Heuven, V. J., Haan, J., Janse, E., & Van der Torre, E. J. (1997). Perceptual identification of sentence type and the time-distribution of prosodic interrogativity markers in Dutch. In Proceedings of the ESCA Tutorial and Research Workshop on Intonation: Theory, Models and Applications, Athens, Greece, 1997 (pp. 317-320).

    Abstract

    Dutch distinguishes at least four sentence types: statements and questions, the latter type being subdivided into wh-questions (beginning with a question word), yes/no-questions (with inversion of subject and finite), and declarative questions (lexico-syntactically identical to statement). Acoustically, each of these (sub)types was found to have clearly distinct global F0-patterns, as well as a characteristic distribution of final rises [1,2]. The present paper explores the separate contribution of parameters of global downtrend and size of accent-lending pitch movements versus aspects of the terminal rise to the human identification of the four sentence (sub)types, at various positions in the time-course of the utterance. The results show that interrogativity in Dutch can be identified at an early point in the utterance. However, wh-questions are not distinct from statements.
  • Van Uytvanck, D., Stehouwer, H., & Lampen, L. (2012). Semantic metadata mapping in practice: The Virtual Language Observatory. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 1029-1034). European Language Resources Association (ELRA).

    Abstract

    In this paper we present the Virtual Language Observatory (VLO), a metadata-based portal for language resources. It is completely based on the Component Metadata (CMDI) and ISOcat standards. This approach allows for the use of heterogeneous metadata schemas while maintaining the semantic compatibility. We describe the metadata harvesting process, based on OAI-PMH, and the conversion from several formats (OLAC, IMDI and the CLARIN LRT inventory) to their CMDI counterpart profiles. Then we focus on some post-processing steps to polish the harvested records. Next, the ingestion of the CMDI files into the VLO facet browser is described. We also include an overview of the changes since the first version of the VLO, based on user feedback from the CLARIN community. Finally there is an overview of additional ideas and improvements for future versions of the VLO.
  • Váradi, T., Wittenburg, P., Krauwer, S., Wynne, M., & Koskenniemi, K. (2008). CLARIN: Common language resources and technology infrastructure. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper gives an overview of the CLARIN project [1], which aims to create a research infrastructure that makes language resources and technology (LRT) available and readily usable to scholars of all disciplines, in particular the humanities and social sciences (HSS).
  • Viebahn, M. C., Ernestus, M., & McQueen, J. M. (2012). Co-occurrence of reduced word forms in natural speech. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 2019-2022).

    Abstract

    This paper presents a corpus study that investigates the co-occurrence of reduced word forms in natural speech. We extracted Dutch past participles from three different speech registers and investigated the influence of several predictor variables on the presence and duration of schwas in prefixes and /t/s in suffixes. Our results suggest that reduced word forms tend to co-occur even if we partial out the effect of speech rate. The implications of our findings for episodic and abstractionist models of lexical representation are discussed.
  • Vosse, T. G., & Kempen, G. (2008). Parsing verb-final clauses in German: Garden-path and ERP effects modeled by a parallel dynamic parser. In B. Love, K. McRae, & V. Sloutsky (Eds.), Proceedings of the 30th Annual Conference on the Cognitive Science Society (pp. 261-266). Washington: Cognitive Science Society.

    Abstract

    Experimental sentence comprehension studies have shown that superficially similar German clauses with verb-final word order elicit very different garden-path and ERP effects. We show that a computer implementation of the Unification Space parser (Vosse & Kempen, 2000) in the form of a localist-connectionist network can model the observed differences, at least qualitatively. The model embodies a parallel dynamic parser that, in contrast with existing models, does not distinguish between consecutive first-pass and reanalysis stages, and does not use semantic or thematic roles. It does use structural frequency data and animacy information.
  • Warner, N., Jongman, A., Mucke, D., & Cutler, A. (2001). The phonological status of schwa insertion in Dutch: An EMA study. In B. Maassen, W. Hulstijn, R. Kent, H. Peters, & P. v. Lieshout (Eds.), Speech motor control in normal and disordered speech: 4th International Speech Motor Conference (pp. 86-89). Nijmegen: Vantilt.

    Abstract

    Articulatory data are used to address the question of whether Dutch schwa insertion is a phonological or a phonetic process. By investigating tongue tip raising and dorsal lowering, we show that /l/ when it appears before inserted schwa is a light /l/, just as /l/ before an underlying schwa is, and unlike the dark /l/ before a consonant in non-insertion productions of the same words. The fact that inserted schwa can condition the light/dark /l/ alternation shows that schwa insertion involves the phonological insertion of a segment rather than phonetic adjustments to articulations.
  • Warner, N. L., McQueen, J. M., Liu, P. Z., Hoffmann, M., & Cutler, A. (2012). Timing of perception for all English diphones [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1967.

    Abstract

    Information in speech does not unfold discretely over time; perceptual cues are gradient and overlapped. However, this varies greatly across segments and environments: listeners cannot identify the affricate in /ptS/ until the frication, but information about the vowel in /li/ begins early. Unlike most prior studies, which have concentrated on subsets of language sounds, this study tests perception of every English segment in every phonetic environment, sampling perceptual identification at six points in time (13,470 stimuli/listener; 20 listeners). Results show that information about consonants after another segment is most localized for affricates (almost entirely in the release), and most gradual for voiced stops. In comparison to stressed vowels, unstressed vowels have less information spreading to
    neighboring segments and are less well identified. Indeed, many vowels,
    especially lax ones, are poorly identified even by the end of the following segment. This may partly reflect listeners’ familiarity with English vowels’ dialectal variability. Diphthongs and diphthongal tense vowels show the most sudden improvement in identification, similar to affricates among the consonants, suggesting that information about segments defined by acoustic change is highly localized. This large dataset provides insights into speech perception and data for probabilistic modeling of spoken word recognition.
  • Weber, A., & Melinger, A. (2008). Name dominance in spoken word recognition is (not) modulated by expectations: Evidence from synonyms. In A. Botinis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (ExLing 2008) (pp. 225-228). Athens: University of Athens.

    Abstract

    Two German eye-tracking experiments tested whether top-down expectations interact with acoustically-driven word-recognition processes. Competitor objects with two synonymous names were paired with target objects whose names shared word onsets with either the dominant or the non-dominant name of the competitor. Non-dominant names of competitor objects were either introduced before the test session or not. Eye-movements were monitored while participants heard instructions to click on target objects. Results demonstrate dominant and non-dominant competitor names were considered for recognition, regardless of top-down expectations, though dominant names were always activated more strongly.
  • Weber, A. (1998). Listening to nonnative language which violates native assimilation rules. In D. Duez (Ed.), Proceedings of the European Scientific Communication Association workshop: Sound patterns of Spontaneous Speech (pp. 101-104).

    Abstract

    Recent studies using phoneme detection tasks have shown that spoken-language processing is neither facilitated nor interfered with by optional assimilation, but is inhibited by violation of obligatory assimilation. Interpretation of these results depends on an assessment of their generality, specifically, whether they also obtain when listeners are processing nonnative language. Two separate experiments are presented in which native listeners of German and native listeners of Dutch had to detect a target fricative in legal monosyllabic Dutch nonwords. All of the nonwords were correct realisations in standard Dutch. For German listeners, however, half of the nonwords contained phoneme strings which violate the German fricative assimilation rule. Whereas the Dutch listeners showed no significant effects, German listeners detected the target fricative faster when the German fricative assimilation was violated than when no violation occurred. The results might suggest that violation of assimilation rules does not have to make processing more difficult per se.
  • Weber, A. (2000). Phonotactic and acoustic cues for word segmentation in English. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP 2000) (pp. 782-785).

    Abstract

    This study investigates the influence of both phonotactic and acoustic cues on the segmentation of spoken English. Listeners detected embedded English words in nonsense sequences (word spotting). Words aligned with phonotactic boundaries were easier to detect than words without such alignment. Acoustic cues to boundaries could also have signaled word boundaries, especially when word onsets lacked phonotactic alignment. However, only one of several durational boundary cues showed a marginally significant correlation with response times (RTs). The results suggest that word segmentation in English is influenced primarily by phonotactic constraints and only secondarily by acoustic aspects of the speech signal.
  • Weber, A. (2000). The role of phonotactics in the segmentation of native and non-native continuous speech. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP, Workshop on Spoken Word Access Processes. Nijmegen: MPI for Psycholinguistics.

    Abstract

    Previous research has shown that listeners make use of their knowledge of phonotactic constraints to segment speech into individual words. The present study investigates the influence of phonotactics when segmenting a non-native language. German and English listeners detected embedded English words in nonsense sequences. German listeners also had knowledge of English, but English listeners had no knowledge of German. Word onsets were either aligned with a syllable boundary or not, according to the phonotactics of the two languages. Words aligned with either German or English phonotactic boundaries were easier for German listeners to detect than words without such alignment. Responses of English listeners were influenced primarily by English phonotactic alignment. The results suggest that both native and non-native phonotactic constraints influence lexical segmentation of a non-native, but familiar, language.
  • Weber, A. (2008). What the eyes can tell us about spoken-language comprehension [Abstract]. Journal of the Acoustical Society of America, 124, 2474-2474.

    Abstract

    Lexical recognition is typically slower in L2 than in L1. Part of the difficulty comes from a not precise enough processing of L2 phonemes. Consequently, L2 listeners fail to eliminate candidate words that L1 listeners can exclude from competing for recognition. For instance, the inability to distinguish /r/ from /l/ in rocket and locker makes for Japanese listeners both words possible candidates when hearing their onset (e.g., Cutler, Weber, and Otake, 2006). The L2 disadvantage can, however, be dispelled: For L2 listeners, but not L1 listeners, L2 speech from a non-native talker with the same language background is known to be as intelligible as L2 speech from a native talker (e.g., Bent and Bradlow, 2003). A reason for this may be that L2 listeners have ample experience with segmental deviations that are characteristic for their own accent. On this account, only phonemic deviations that are typical for the listeners’ own accent will cause spurious lexical activation in L2 listening (e.g., English magic pronounced as megic for Dutch listeners). In this talk, I will present evidence from cross-modal priming studies with a variety of L2 listener groups, showing how the processing of phonemic deviations is accent-specific but withstands fine phonetic differences.
  • Windhouwer, M., Broeder, D., & Van Uytvanck, D. (2012). A CMD core model for CLARIN web services. In Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 41-48).

    Abstract

    In the CLARIN infrastructure various national projects have started initiatives to allow users of the infrastructure to create chains or workflows of web services. The Component Metadata (CMD) core model for web services described in this paper tries to align the metadata descriptions of these various initiatives. This should allow chaining/workflow engines to find matching and invoke services. The paper describes the landscape of web services architectures and the state of the national initiatives. Based on this a CMD core model for CLARIN is proposed, which, within some limits, can be adapted to the specific needs of an initiative by the standard facilities of CMD. The paper closes with the current state and usage of the model and a look into the future.
  • Windhouwer, M. (2012). RELcat: a Relation Registry for ISOcat data categories. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 3661-3664). European Language Resources Association (ELRA).

    Abstract

    The ISOcat Data Category Registry contains basically a flat and easily extensible list of data category specifications. To foster reuse and standardization only very shallow relationships among data categories are stored in the registry. However, to assist crosswalks, possibly based on personal views, between various (application) domains and to overcome possible proliferation of data categories more types of ontological relationships need to be specified. RELcat is a first prototype of a Relation Registry, which allows storing arbitrary relationships. These relationships can reflect the personal view of one linguist or a larger community. The basis of the registry is a relation type taxonomy that can easily be extended. This allows on one hand to load existing sets of relations specified in, for example, an OWL (2) ontology or SKOS taxonomy. And on the other hand allows algorithms that query the registry to traverse the stored semantic network to remain ignorant of the original source vocabulary. This paper describes first experiences with RELcat and explains some initial design decisions.
  • Windhouwer, M. (2012). Towards standardized descriptions of linguistic features: ISOcat and procedures for using common data categories. In J. Jancsary (Ed.), Proceedings of the Conference on Natural Language Processing 2012, (SFLR 2012 workshop), September 19-21, 2012, Vienna (pp. 494). Vienna: Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI).

    Abstract

    Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams.
  • Withers, P. (2012). Metadata management with Arbil. In V. Arranz, D. Broeder, B. Gaiffe, M. Gavrilidou, & M. Monachini (Eds.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 72-75). European Language Resources Association (ELRA).

    Abstract

    Arbil is an application designed to create and manage metadata for research data and to arrange this data into a structure appropriate for archiving. The metadata is displayed in tables, which allows an overview of the metadata and the ability to populate and update many metadata sections in bulk. Both IMDI and Clarin metadata formats are supported and Arbil has been designed as a local application so that it can also be used offline, for instance in remote field sites. The metadata can be entered in any order or at any stage that the user is able; once the metadata and its data are ready for archiving and an Internet connection is available it can be exported from Arbil and in the case of IMDI it can then be transferred to the main archive via LAMUS (archive management and upload system).

Share this page