Publications

Displaying 101 - 200 of 246
  • Hanulikova, A. (2008). Word recognition in possible word contexts. In M. Kokkonidis (Ed.), Proceedings of LingO 2007 (pp. 92-99). Oxford: Faculty of Linguistics, Philology, and Phonetics, University of Oxford.

    Abstract

    The Possible-Word Constraint (PWC; Norris, McQueen, Cutler, and Butterfield 1997) suggests that segmentation of continuous speech operates with a universal constraint that feasible words should contain a vowel. Single consonants, because they do not constitute syllables, are treated as non-viable residues. Two word-spotting experiments are reported that investigate whether the PWC really is a language-universal principle. According to the PWC, Slovak listeners should, just like Germans, be slower at spotting words in single consonant contexts (not feasible words) as compared to syllable contexts (feasible words)—even if single consonants can be words in Slovak. The results confirm the PWC in German but not in Slovak.
  • Harbusch, K., & Kempen, G. (2000). Complexity of linear order computation in Performance Grammar, TAG and HPSG. In Proceedings of Fifth International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+5) (pp. 101-106).

    Abstract

    This paper investigates the time and space complexity of word order computation in the psycholinguistically motivated grammar formalism of Performance Grammar (PG). In PG, the first stage of syntax assembly yields an unordered tree ('mobile') consisting of a hierarchy of lexical frames (lexically anchored elementary trees). Associated with each lexica l frame is a linearizer—a Finite-State Automaton that locally computes the left-to-right order of the branches of the frame. Linearization takes place after the promotion component may have raised certain constituents (e.g. Wh- or focused phrases) into the domain of lexical frames higher up in the syntactic mobile. We show that the worst-case time and space complexity of analyzing input strings of length n is O(n5) and O(n4), respectively. This result compares favorably with the time complexity of word-order computations in Tree Adjoining Grammar (TAG). A comparison with Head-Driven Phrase Structure Grammar (HPSG) reveals that PG yields a more declarative linearization method, provided that the FSA is rewritten as an equivalent regular expression.
  • Harbusch, K., & Kempen, G. (2006). ELLEIPO: A module that computes coordinative ellipsis for language generators that don't. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2006) (pp. 115-118).

    Abstract

    Many current sentence generators lack the ability to compute elliptical versions of coordinated clauses in accordance with the rules for Gapping, Forward and Backward Conjunction Reduction, and SGF (Subject Gap in clauses with Finite/ Fronted verb). We describe a module (implemented in JAVA, with German and Dutch as target languages) that takes non-elliptical coordinated clauses as input and returns all reduced versions licensed by coordinative ellipsis. It is loosely based on a new psycholinguistic theory of coordinative ellipsis proposed by Kempen. In this theory, coordinative ellipsis is not supposed to result from the application of declarative grammar rules for clause formation but from a procedural component that interacts with the sentence generator and may block the overt expression of certain constituents.
  • Harbusch, K., Kempen, G., Van Breugel, C., & Koch, U. (2006). A generation-oriented workbench for performance grammar: Capturing linear order variability in German and Dutch. In Proceedings of the 4th International Natural Language Generation Conference (pp. 9-11).

    Abstract

    We describe a generation-oriented workbench for the Performance Grammar (PG) formalism, highlighting the treatment of certain word order and movement constraints in Dutch and German. PG enables a simple and uniform treatment of a heterogeneous collection of linear order phenomena in the domain of verb constructions (variably known as Cross-serial Dependencies, Verb Raising, Clause Union, Extraposition, Third Construction, Particle Hopping, etc.). The central data structures enabling this feature are clausal “topologies”: one-dimensional arrays associated with clauses, whose cells (“slots”) provide landing sites for the constituents of the clause. Movement operations are enabled by unification of lateral slots of topologies at adjacent levels of the clause hierarchy. The PGW generator assists the grammar developer in testing whether the implemented syntactic knowledge allows all and only the well-formed permutations of constituents.
  • Harbusch, K., Kempen, G., & Vosse, T. (2008). A natural-language paraphrase generator for on-line monitoring and commenting incremental sentence construction by L2 learners of German. In Proceedings of WorldCALL 2008.

    Abstract

    Certain categories of language learners need feedback on the grammatical structure of sentences they wish to produce. In contrast with the usual NLP approach to this problem—parsing student-generated texts—we propose a generation-based approach aiming at preventing errors (“scaffolding”). In our ICALL system, students construct sentences by composing syntactic trees out of lexically anchored “treelets” via a graphical drag&drop user interface. A natural-language generator computes all possible grammatically well-formed sentences entailed by the student-composed tree, and intervenes immediately when the latter tree does not belong to the set of well-formed alternatives. Feedback is based on comparisons between the student-composed tree and the well-formed set. Frequently occurring errors are handled in terms of “malrules.” The system (implemented in JAVA and C++) currently focuses constituent order in German as L2.
  • Herbst, L. E. (2006). The influence of language dominance on bilingual VOT: A case study. In Proceedings of the 4th University of Cambridge Postgraduate Conference on Language Research (CamLing 2006) (pp. 91-98). Cambridge: Cambridge University Press.

    Abstract

    Longitudinally collected VOT data from an early English-Italian bilingual who became increasingly English-dominant was analyzed. Stops in English were always produced with significantly longer VOT than in Italian. However, the speaker did not show any significant change in the VOT production in either language over time, despite the clear dominance of English in his every day language use later in his life. The results indicate that – unlike L2 learners – early bilinguals may remain unaffected by language use with respect to phonetic realization.
  • Holler, J., & Stevens, R. (2006). How speakers represent size information in referential communication for knowing and unknowing recipients. In D. Schlangen, & R. Fernandez (Eds.), Brandial '06 Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue, Potsdam, Germany, September 11-13.
  • Holler, J., Kelly, S., Hagoort, P., & Ozyurek, A. (2012). When gestures catch the eye: The influence of gaze direction on co-speech gesture comprehension in triadic communication. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 467-472). Austin, TX: Cognitive Society. Retrieved from http://mindmodeling.org/cogsci2012/papers/0092/index.html.

    Abstract

    Co-speech gestures are an integral part of human face-to-face communication, but little is known about how pragmatic factors influence our comprehension of those gestures. The present study investigates how different types of recipients process iconic gestures in a triadic communicative situation. Participants (N = 32) took on the role of one of two recipients in a triad and were presented with 160 video clips of an actor speaking, or speaking and gesturing. Crucially, the actor’s eye gaze was manipulated in that she alternated her gaze between the two recipients. Participants thus perceived some messages in the role of addressed recipient and some in the role of unaddressed recipient. In these roles, participants were asked to make judgements concerning the speaker’s messages. Their reaction times showed that unaddressed recipients did comprehend speaker’s gestures differently to addressees. The findings are discussed with respect to automatic and controlled processes involved in gesture comprehension.
  • Indefrey, P., & Gullberg, M. (Eds.). (2008). Time to speak: Cognitive and neural prerequisites for time in language [Special Issue]. Language Learning, 58(suppl. 1).

    Abstract

    Time is a fundamental aspect of human cognition and action. All languages have developed rich means to express various facets of time, such as bare time spans, their position on the time line, or their duration. The articles in this volume give an overview of what we know about the neural and cognitive representations of time that speakers can draw on in language. Starting with an overview of the main devices used to encode time in natural language, such as lexical elements, tense and aspect, the research presented in this volume addresses the relationship between temporal language, culture, and thought, the relationship between verb aspect and mental simulations of events, the development of temporal concepts, time perception, the storage and retrieval of temporal information in autobiographical memory, and neural correlates of tense processing and sequence planning. The psychological and neurobiological findings presented here will provide important insights to inform and extend current studies of time in language and in language acquisition.
  • Isaac, A., Matthezing, H., Van der Meij, L., Schlobach, S., Wang, S., & Zinn, C. (2008). Putting ontology alignment in context: Usage, scenarios, deployment and evaluation in a library case. In S. Bechhofer, M. Hauswirth, J. Hoffmann, & M. Koubarakis (Eds.), The semantic web: Research and applications (pp. 402-417). Berlin: Springer.

    Abstract

    Thesaurus alignment plays an important role in realising efficient access to heterogeneous Cultural Heritage data. Current ontology alignment techniques, however, provide only limited value for such access as they consider little if any requirements from realistic use cases or application scenarios. In this paper, we focus on two real-world scenarios in a library context: thesaurus merging and book re-indexing. We identify their particular requirements and describe our approach of deploying and evaluating thesaurus alignment techniques in this context. We have applied our approach for the Ontology Alignment Evaluation Initiative, and report on the performance evaluation of participants’ tools wrt. the application scenario at hand. It shows that evaluations of tools requires significant effort, but when done carefully, brings many benefits.
  • Janse, E., Sennema, A., & Slis, A. (2000). Fast speech timing in Dutch: The durational correlates of lexical stress and pitch accent. In Proceedings of the VIth International Conference on Spoken Language Processing, Vol. III (pp. 251-254).

    Abstract

    n this study we investigated the durational correlates of lexical stress and pitch accent at normal and fast speech rate in Dutch. Previous literature on English shows that durations of lexically unstressed vowels are reduced more than stressed vowels when speakers increase their speech rate. We found that the same holds for Dutch, irrespective of whether the unstressed vowel is schwa or a "full" vowel. In the same line, we expected that vowels in words without a pitch accent would be shortened relatively more than vowels in words with a pitch accent. This was not the case: if anything, the accented vowels were shortened relatively more than the unaccented vowels. We conclude that duration is an important cue for lexical stress, but not for pitch accent.
  • Janse, E. (2000). Intelligibility of time-compressed speech: Three ways of time-compression. In Proceedings of the VIth International Conference on Spoken Language Processing, vol. III (pp. 786-789).

    Abstract

    Studies on fast speech have shown that word-level timing of fast speech differs from that of normal rate speech in that unstressed syllables are shortened more than stressed syllables as speech rate increases. An earlier experiment showed that the intelligibility of time-compressed speech could not be improved by making its temporal organisation closer to natural fast speech. To test the hypothesis that segmental intelligibility is more important than prosodic timing in listening to timecompressed speech, the intelligibility of bisyllabic words was tested in three time-compression conditions: either stressed and unstressed syllable were compressed to the same degree, or the stressed syllable was compressed more than the unstressed syllable, or the reverse. As was found before, imitating wordlevel timing of fast speech did not improve intelligibility over linear compression. However, the results did not confirm the hypothesis either: there was no difference in intelligibility between the three compression conditions. We conclude that segmental intelligibility plays an important role, but further research is necessary to decide between the contributions of prosody and segmental intelligibility to the word-level intelligibility of time-compressed speech.
  • Janse, E., & Quené, H. (1999). On the suitability of the cross-modal semantic priming task. In Proceedings of the XIVth International Congress of Phonetic Sciences (pp. 1937-1940).
  • Jesse, A., & Johnson, E. K. (2008). Audiovisual alignment in child-directed speech facilitates word learning. In Proceedings of the International Conference on Auditory-Visual Speech Processing (pp. 101-106). Adelaide, Aust: Causal Productions.

    Abstract

    Adult-to-child interactions are often characterized by prosodically-exaggerated speech accompanied by visually captivating co-speech gestures. In a series of adult studies, we have shown that these gestures are linked in a sophisticated manner to the prosodic structure of adults' utterances. In the current study, we use the Preferential Looking Paradigm to demonstrate that two-year-olds can use the alignment of these gestures to speech to deduce the meaning of words.
  • Johnson, E. K., Jusczyk, P. W., Cutler, A., & Norris, D. (2000). The development of word recognition: The use of the possible-word constraint by 12-month-olds. In L. Gleitman, & A. Joshi (Eds.), Proceedings of CogSci 2000 (pp. 1034). London: Erlbaum.
  • Kempen, G. (1997). De ontdubbelde taalgebruiker: Maken taalproductie en taalperceptie gebruik van één en dezelfde syntactische processor? [Abstract]. In 6e Winter Congres NvP. Programma and abstracts (pp. 31-32). Nederlandse Vereniging voor Psychonomie.
  • Kempen, G., Kooij, A., & Van Leeuwen, T. (1997). Do skilled readers exploit inflectional spelling cues that do not mirror pronunciation? An eye movement study of morpho-syntactic parsing in Dutch. In Abstracts of the Orthography Workshop "What spelling changes". Nijmegen: Max Planck Institute for Psycholinguistics.
  • Kemps-Snijders, M., Klassmann, A., Zinn, C., Berck, P., Russel, A., & Wittenburg, P. (2008). Exploring and enriching a language resource archive via the web. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    The ”download first, then process paradigm” is still the predominant working method amongst the research community. The web-based paradigm, however, offers many advantages from a tool development and data management perspective as they allow a quick adaptation to changing research environments. Moreover, new ways of combining tools and data are increasingly becoming available and will eventually enable a true web-based workflow approach, thus challenging the ”download first, then process” paradigm. The necessary infrastructure for managing, exploring and enriching language resources via the Web will need to be delivered by projects like CLARIN and DARIAH
  • Kemps-Snijders, M., Zinn, C., Ringersma, J., & Windhouwer, M. (2008). Ensuring semantic interoperability on lexical resources. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    In this paper, we describe a unifying approach to tackle data heterogeneity issues for lexica and related resources. We present LEXUS, our software that implements the Lexical Markup Framework (LMF) to uniformly describe and manage lexica of different structures. LEXUS also makes use of a central Data Category Registry (DCR) to address terminological issues with regard to linguistic concepts as well as the handling of working and object languages. Finally, we report on ViCoS, a LEXUS extension, providing support for the definition of arbitrary semantic relations between lexical entries or parts thereof.
  • Kemps-Snijders, M., Ducret, J., Romary, L., & Wittenburg, P. (2006). An API for accessing the data category registry. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 2299-2302).
  • Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., & Wright, S. E. (2008). ISOcat: Corralling data categories in the wild. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    To achieve true interoperability for valuable linguistic resources different levels of variation need to be addressed. ISO Technical Committee 37, Terminology and other language and content resources, is developing a Data Category Registry. This registry will provide a reusable set of data categories. A new implementation, dubbed ISOcat, of the registry is currently under construction. This paper shortly describes the new data model for data categories that will be introduced in this implementation. It goes on with a sketch of the standardization process. Completed data categories can be reused by the community. This is done by either making a selection of data categories using the ISOcat web interface, or by other tools which interact with the ISOcat system using one of its various Application Programming Interfaces. Linguistic resources that use data categories from the registry should include persistent references, e.g. in the metadata or schemata of the resource, which point back to their origin. These data category references can then be used to determine if two or more resources share common semantics, thus providing a level of interoperability close to the source data and a promising layer for semantic alignment on higher levels
  • Kemps-Snijders, M., Nederhof, M.-J., & Wittenburg, P. (2006). LEXUS, a web-based tool for manipulating lexical resources. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1862-1865).
  • Klassmann, A., Offenga, F., Broeder, D., Skiba, R., & Wittenburg, P. (2006). Comparison of resource discovery methods. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 113-116).
  • Klein, W. (2000). Changing concepts of the nature-nurture debate. In R. Hide, J. Mittelstrass, & W. Singer (Eds.), Changing concepts of nature at the turn of the millenium: Proceedings plenary session of the Pontifical academy of sciences, 26-29 October 1998 (pp. 289-299). Vatican City: Pontificia Academia Scientiarum.
  • Klein, W., & Musan, R. (Eds.). (1999). Das deutsche Perfekt [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (113).
  • Klein, W., & Schnell, R. (Eds.). (2008). Literaturwissenschaft und Linguistik [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (150).
  • Klein, W. (Ed.). (2008). Ist Schönheit messbar? [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 152.
  • Klein, W. (Ed.). (1976). Psycholinguistik [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (23/24).
  • Klein, W. (Ed.). (1997). Technologischer Wandel in den Philologien [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (106).
  • Klein, W. (Ed.). (1984). Textverständlichkeit - Textverstehen [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (55).
  • Klein, W. (Ed.). (2000). Sprache des Rechts [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (118).
  • Klein, W. (Ed.). (1990). Sprache und Raum [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (78).
  • Klein, W. (Ed.). (1986). Sprachverfall [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (62).
  • Klein, W., & Schlieben-Lange, B. (Eds.). (1990). Zukunft der Sprache [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (79).
  • Koster, M., & Cutler, A. (1997). Segmental and suprasegmental contributions to spoken-word recognition in Dutch. In Proceedings of EUROSPEECH 97 (pp. 2167-2170). Grenoble, France: ESCA.

    Abstract

    Words can be distinguished by segmental differences or by suprasegmental differences or both. Studies from English suggest that suprasegmentals play little role in human spoken-word recognition; English stress, however, is nearly always unambiguously coded in segmental structure (vowel quality); this relationship is less close in Dutch. The present study directly compared the effects of segmental and suprasegmental mispronunciation on word recognition in Dutch. There was a strong effect of suprasegmental mispronunciation, suggesting that Dutch listeners do exploit suprasegmental information in word recognition. Previous findings indicating the effects of mis-stressing for Dutch differ with stress position were replicated only when segmental change was involved, suggesting that this is an effect of segmental rather than suprasegmental processing.
  • Kuzla, C., Mitterer, H., Ernestus, M., & Cutler, A. (2006). Perceptual compensation for voice assimilation of German fricatives. In P. Warren, & I. Watson (Eds.), Proceedings of the 11th Australasian International Conference on Speech Science and Technology (pp. 394-399).

    Abstract

    In German, word-initial lax fricatives may be produced with substantially reduced glottal vibration after voiceless obstruents. This assimilation occurs more frequently and to a larger extent across prosodic word boundaries than across phrase boundaries. Assimilatory devoicing makes the fricatives more similar to their tense counterparts and could thus hinder word recognition. The present study investigates how listeners cope with assimilatory devoicing. Results of a cross-modal priming experiment indicate that listeners compensate for assimilation in appropriate contexts. Prosodic structure moderates compensation for assimilation: Compensation occurs especially after phrase boundaries, where devoiced fricatives are sufficiently long to be confused with their tense counterparts.
  • Kuzla, C., Ernestus, M., & Mitterer, H. (2006). Prosodic structure affects the production and perception of voice-assimilated German fricatives. In R. Hoffmann, & H. Mixdorff (Eds.), Speech prosody 2006. Dresden: TUD Press.

    Abstract

    Prosodic structure has long been known to constrain phonological processes [1]. More recently, it has also been recognized as a source of fine-grained phonetic variation of speech sounds. In particular, segments in domain-initial position undergo prosodic strengthening [2, 3], which also implies more resistance to coarticulation in higher prosodic domains [5]. The present study investigates the combined effects of prosodic strengthening and assimilatory devoicing on word-initial fricatives in German, the functional implication of both processes for cues to the fortis-lenis contrast, and the influence of prosodic structure on listeners’ compensation for assimilation. Results indicate that 1. Prosodic structure modulates duration and the degree of assimilatory devoicing, 2. Phonological contrasts are maintained by speakers, but differ in phonetic detail across prosodic domains, and 3. Compensation for assimilation in perception is moderated by prosodic structure and lexical constraints.
  • Kuzla, C., Mitterer, H., & Ernestus, M. (2006). Compensation for assimilatory devoicing and prosodic structure in German fricative perception. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 43-44).
  • Lansner, A., Sandberg, A., Petersson, K. M., & Ingvar, M. (2000). On forgetful attractor network memories. In H. Malmgren, M. Borga, & L. Niklasson (Eds.), Artificial neural networks in medicine and biology: Proceedings of the ANNIMAB-1 Conference, Göteborg, Sweden, 13-16 May 2000 (pp. 54-62). Heidelberg: Springer Verlag.

    Abstract

    A recurrently connected attractor neural network with a Hebbian learning rule is currently our best ANN analogy for a piece cortex. Functionally biological memory operates on a spectrum of time scales with regard to induction and retention, and it is modulated in complex ways by sub-cortical neuromodulatory systems. Moreover, biological memory networks are commonly believed to be highly distributed and engage many co-operating cortical areas. Here we focus on the temporal aspects of induction and retention of memory in a connectionist type attractor memory model of a piece of cortex. A continuous time, forgetful Bayesian-Hebbian learning rule is described and compared to the characteristics of LTP and LTD seen experimentally. More generally, an attractor network implementing this learning rule can operate as a long-term, intermediate-term, or short-term memory. Modulation of the print-now signal of the learning rule replicates some experimental memory phenomena, like e.g. the von Restorff effect.
  • Lenkiewicz, P., Auer, E., Schreer, O., Masneri, S., Schneider, D., & Tschöpe, S. (2012). AVATecH ― automated annotation through audio and video analysis. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 209-214). European Language Resources Association.

    Abstract

    In different fields of the humanities annotations of multimodal resources are a necessary component of the research workflow. Examples include linguistics, psychology, anthropology, etc. However, creation of those annotations is a very laborious task, which can take 50 to 100 times the length of the annotated media, or more. This can be significantly improved by applying innovative audio and video processing algorithms, which analyze the recordings and provide automated annotations. This is the aim of the AVATecH project, which is a collaboration of the Max Planck Institute for Psycholinguistics (MPI) and the Fraunhofer institutes HHI and IAIS. In this paper we present a set of results of automated annotation together with an evaluation of their quality.
  • Lenkiewicz, P., Pereira, M., Freire, M., & Fernandes, J. (2008). Accelerating 3D medical image segmentation with high performance computing. In Proceedings of the IEEE International Workshops on Image Processing Theory, Tools and Applications - IPT (pp. 1-8).

    Abstract

    Digital processing of medical images has helped physicians and patients during past years by allowing examination and diagnosis on a very precise level. Nowadays possibly the biggest deal of support it can offer for modern healthcare is the use of high performance computing architectures to treat the huge amounts of data that can be collected by modern acquisition devices. This paper presents a parallel processing implementation of an image segmentation algorithm that operates on a computer cluster equipped with 10 processing units. Thanks to well-organized distribution of the workload we manage to significantly shorten the execution time of the developed algorithm and reach a performance gain very close to linear.
  • Lenkiewicz, A., Lis, M., & Lenkiewicz, P. (2012). Linguistic concepts described with Media Query Language for automated annotation. In J. C. Meiser (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 477-479).

    Abstract

    Introduction Human spoken communication is multimodal, i.e. it encompasses both speech and gesture. Acoustic properties of voice, body movements, facial expression, etc. are an inherent and meaningful part of spoken interaction; they can provide attitudinal, grammatical and semantic information. In the recent years interest in audio-visual corpora has been rising rapidly as they enable investigation of different communicative modalities and provide more holistic view on communication (Kipp et al. 2009). Moreover, for some languages such corpora are the only available resource, as is the case for endangered languages for which no written resources exist.
  • Lenkiewicz, P., Van Uytvanck, D., Wittenburg, P., & Drude, S. (2012). Towards automated annotation of audio and video recordings by application of advanced web-services. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 1880-1883).

    Abstract

    In this paper we describe audio and video processing algorithms that are developed in the scope of AVATecH project. The purpose of these algorithms is to shorten the time taken by manual annotation of audio and video recordings by extracting features from media files and creating semi-automated annotations. We show that the use of such supporting algorithms can shorten the annotation time to 30-50% of the time necessary to perform a fully manual annotation of the same kind.
  • Levelt, W. J. M., & Plomp, R. (1962). Musical consonance and critical bandwidth. In Proceedings of the 4th International Congress Acoustics (pp. 55-55).
  • Levelt, W. J. M. (1984). Spontaneous self-repairs in speech: Processes and representations. In M. P. R. Van den Broecke, & A. Cohen (Eds.), Proceedings of the 10th International Congress of Phonetic Sciences (pp. 105-117). Dordrecht: Foris.
  • Levinson, S. C. (2006). On the human "interaction engine". In N. J. Enfield, & S. C. Levinson (Eds.), Roots of human sociality: Culture, cognition and interaction (pp. 39-69). Oxford: Berg.
  • Levinson, S. C. (2000). Language as nature and language as art. In J. Mittelstrass, & W. Singer (Eds.), Proceedings of the Symposium on ‘Changing concepts of nature and the turn of the Millennium (pp. 257-287). Vatican City: Pontificae Academiae Scientiarium Scripta Varia.
  • Levinson, S. C. (2000). H.P. Grice on location on Rossel Island. In S. S. Chang, L. Liaw, & J. Ruppenhofer (Eds.), Proceedings of the 25th Annual Meeting of the Berkeley Linguistic Society (pp. 210-224). Berkeley: Berkeley Linguistic Society.
  • Lucas, C., Griffiths, T., Xu, F., & Fawcett, C. (2008). A rational model of preference learning and choice prediction by children. In D. Koller, Y. Bengio, D. Schuurmans, L. Bottou, & A. Culotta (Eds.), Advances in Neural Information Processing Systems.

    Abstract

    Young children demonstrate the ability to make inferences about the preferences of other agents based on their choices. However, there exists no overarching account of what children are doing when they learn about preferences or how they use that knowledge. We use a rational model of preference learning, drawing on ideas from economics and computer science, to explain the behavior of children in several recent experiments. Specifically, we show how a simple econometric model can be extended to capture two- to four-year-olds’ use of statistical information in inferring preferences, and their generalization of these preferences.
  • Magyari, L., & De Ruiter, J. P. (2008). Timing in conversation: The anticipation of turn endings. In J. Ginzburg, P. Healey, & Y. Sato (Eds.), Proceedings of the 12th Workshop on the Semantics and Pragmatics Dialogue (pp. 139-146). London: King's college.

    Abstract

    We examined how communicators can switch between speaker and listener role with such accurate timing. During conversations, the majority of role transitions happens with a gap or overlap of only a few hundred milliseconds. This suggests that listeners can predict when the turn of the current speaker is going to end. Our hypothesis is that listeners know when a turn ends because they know how it ends. Anticipating the last words of a turn can help the next speaker in predicting when the turn will end, and also in anticipating the content of the turn, so that an appropriate response can be prepared in advance. We used the stimuli material of an earlier experiment (De Ruiter, Mitterer & Enfield, 2006), in which subjects were listening to turns from natural conversations and had to press a button exactly when the turn they were listening to ended. In the present experiment, we investigated if the subjects can complete those turns when only an initial fragment of the turn is presented to them. We found that the subjects made better predictions about the last words of those turns that had more accurate responses in the earlier button press experiment.
  • Majid, A., Enfield, N. J., & Van Staden, M. (Eds.). (2006). Parts of the body: Cross-linguistic categorisation [Special Issue]. Language Sciences, 28(2-3).
  • Majid, A. (2012). Taste in twenty cultures [Abstract]. Abstracts from the XXIth Congress of European Chemoreception Research Organization, ECRO-2011. Publ. in Chemical Senses, 37(3), A10.

    Abstract

    Scholars disagree about the extent to which language can tell us
    about conceptualisation of the world. Some believe that language
    is a direct window onto concepts: Having a word ‘‘bird’’, ‘‘table’’ or
    ‘‘sour’’ presupposes the corresponding underlying concept, BIRD,
    TABLE, SOUR. Others disagree. Words are thought to be uninformative,
    or worse, misleading about our underlying conceptual representations;
    after all, our mental worlds are full of ideas that we
    struggle to express in language. How could this be so, argue sceptics,
    if language were a direct window on our inner life? In this presentation,
    I consider what language can tell us about the
    conceptualisation of taste. By considering linguistic data from
    twenty unrelated cultures – varying in subsistence mode (huntergatherer
    to industrial), ecological zone (rainforest jungle to desert),
    dwelling type (rural and urban), and so forth – I argue any single language is, indeed, impoverished about what it can reveal about
    taste. But recurrent lexicalisation patterns across languages can
    provide valuable insights about human taste experience. Moreover,
    language patterning is part of the data that a good theory of taste
    perception has to be answerable for. Taste researchers, therefore,
    cannot ignore the crosslinguistic facts.
  • Majid, A., Boroditsky, L., & Gaby, A. (Eds.). (2012). Time in terms of space [Research topic] [Special Issue]. Frontiers in cultural psychology. Retrieved from http://www.frontiersin.org/cultural_psychology/researchtopics/Time_in_terms_of_space/755.

    Abstract

    This Research Topic explores the question: what is the relationship between representations of time and space in cultures around the world? This question touches on the broader issue of how humans come to represent and reason about abstract entities – things we cannot see or touch. Time is a particularly opportune domain to investigate this topic. Across cultures, people use spatial representations for time, for example in graphs, time-lines, clocks, sundials, hourglasses, and calendars. In language, time is also heavily related to space, with spatial terms often used to describe the order and duration of events. In English, for example, we might move a meeting forward, push a deadline back, attend a long concert or go on a short break. People also make consistent spatial gestures when talking about time, and appear to spontaneously invoke spatial representations when processing temporal language. A large body of evidence suggests a close correspondence between temporal and spatial language and thought. However, the ways that people spatialize time can differ dramatically across languages and cultures. This research topic identifies and explores some of the sources of this variation, including patterns in spatial thinking, patterns in metaphor, gesture and other cultural systems. This Research Topic explores how speakers of different languages talk about time and space and how they think about these domains, outside of language. The Research Topic invites papers exploring the following issues: 1. Do the linguistic representations of space and time share the same lexical and morphosyntactic resources? 2. To what extent does the conceptualization of time follow the conceptualization of space?
  • Malaisé, V., Aroyo, L., Brugman, H., Gazendam, L., De Jong, A., Negru, C., & Schreiber, G. (2006). Evaluating a thesaurus browser for an audio-visual archive. In S. Staab, & V. Svatek (Eds.), Managing knowledge in a world of networks (pp. 272-286). Berlin: Springer.
  • McCafferty, S. G., & Gullberg, M. (Eds.). (2008). Gesture and SLA: Toward an integrated approach [Special Issue]. Studies in Second Language Acquisition, 30(2).
  • McQueen, J. M., Cutler, A., & Norris, D. (2000). Positive and negative influences of the lexicon on phonemic decision-making. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the Sixth International Conference on Spoken Language Processing: Vol. 3 (pp. 778-781). Beijing: China Military Friendship Publish.

    Abstract

    Lexical knowledge influences how human listeners make decisions about speech sounds. Positive lexical effects (faster responses to target sounds in words than in nonwords) are robust across several laboratory tasks, while negative effects (slower responses to targets in more word-like nonwords than in less word-like nonwords) have been found in phonetic decision tasks but not phoneme monitoring tasks. The present experiments tested whether negative lexical effects are therefore a task-specific consequence of the forced choice required in phonetic decision. We compared phoneme monitoring and phonetic decision performance using the same Dutch materials in each task. In both experiments there were positive lexical effects, but no negative lexical effects. We observe that in all studies showing negative lexical effects, the materials were made by cross-splicing, which meant that they contained perceptual evidence supporting the lexically-consistent phonemes. Lexical knowledge seems to influence phonemic decision-making only when there is evidence for the lexically-consistent phoneme in the speech signal.
  • McQueen, J. M., Cutler, A., & Norris, D. (2000). Why Merge really is autonomous and parsimonious. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP (Workshop on Spoken Word Access Processes) (pp. 47-50). Nijmegen: Max-Planck-Institute for Psycholinguistics.

    Abstract

    We briefly describe the Merge model of phonemic decision-making, and, in the light of general arguments about the possible role of feedback in spoken-word recognition, defend Merge's feedforward structure. Merge not only accounts adequately for the data, without invoking feedback connections, but does so in a parsimonious manner.
  • Melinger, A., Schulte im Walde, S., & Weber, A. (2006). Characterizing response types and revealing noun ambiguity in German association norms. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Trento: Association for Computational Linguistics.

    Abstract

    This paper presents an analysis of semantic association norms for German nouns. In contrast to prior studies, we not only collected associations elicited by written representations of target objects but also by their pictorial representations. In a first analysis, we identified systematic differences in the type and distribution of associate responses for the two presentation forms. In a second analysis, we applied a soft cluster analysis to the collected target-response pairs. We subsequently used the clustering to predict noun ambiguity and to discriminate senses in our target nouns.
  • Meyer, A. S., & Wheeldon, L. (Eds.). (2006). Language production across the life span [Special Issue]. Language and Cognitive Processes, 21(1-3).
  • Mitterer, H. (Ed.). (2012). Ecological aspects of speech perception [Research topic] [Special Issue]. Frontiers in Cognition.

    Abstract

    Our knowledge of speech perception is largely based on experiments conducted with carefully recorded clear speech presented under good listening conditions to undistracted listeners - a near-ideal situation, in other words. But the reality poses a set of different challenges. First of all, listeners may need to divide their attention between speech comprehension and another task (e.g., driving). Outside the laboratory, the speech signal is often slurred by less than careful pronunciation and the listener has to deal with background noise. Moreover, in a globalized world, listeners need to understand speech in more than their native language. Relatedly, the speakers we listen to often have a different language background so we have to deal with a foreign or regional accent we are not familiar with. Finally, outside the laboratory, speech perception is not an end in itself, but rather a mean to contribute to a conversation. Listeners do not only need to understand the speech they are hearing, they also need to use this information to plan and time their own responses. For this special topic, we invite papers that address any of these ecological aspects of speech perception.
  • Mitterer, H. (2008). How are words reduced in spontaneous speech? In A. Botonis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (pp. 165-168). Athens: University of Athens.

    Abstract

    Words are reduced in spontaneous speech. If reductions are constrained by functional (i.e., perception and production) constraints, they should not be arbitrary. This hypothesis was tested by examing the pronunciations of high- to mid-frequency words in a Dutch and a German spontaneous speech corpus. In logistic-regression models the "reduction likelihood" of a phoneme was predicted by fixed-effect predictors such as position within the word, word length, word frequency, and stress, as well as random effects such as phoneme identity and word. The models for Dutch and German show many communalities. This is in line with the assumption that similar functional constraints influence reductions in both languages.
  • Namjoshi, J., Tremblay, A., Broersma, M., Kim, S., & Cho, T. (2012). Influence of recent linguistic exposure on the segmentation of an unfamiliar language [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1968.

    Abstract

    Studies have shown that listeners segmenting unfamiliar languages transfer native-language (L1) segmentation cues. These studies, however, conflated L1 and recent linguistic exposure. The present study investigates the relative influences of L1 and recent linguistic exposure on the use of prosodic cues for segmenting an artificial language (AL). Participants were L1-French listeners, high-proficiency L2-French L1-English listeners, and L1-English listeners without functional knowledge of French. The prosodic cue assessed was F0 rise, which is word-final in French, but in English tends to be word-initial. 30 participants heard a 20-minute AL speech stream with word-final boundaries marked by F0 rise, and decided in a subsequent listening task which of two words (without word-final F0 rise) had been heard in the speech stream. The analyses revealed a marginally significant effect of L1 (all listeners) and, importantly, a significant effect of recent linguistic exposure (L1-French and L2-French listeners): accuracy increased with decreasing time in the US since the listeners’ last significant (3+ months) stay in a French-speaking environment. Interestingly, no effect of L2 proficiency was found (L2-French listeners).
  • Nordhoff, S., & Hammarström, H. (2012). Glottolog/Langdoc: Increasing the visibility of grey literature for low-density languages. In N. Calzolari (Ed.), Proceedings of the 8th International Conference on Language Resources and Evaluation [LREC 2012], May 23-25, 2012 (pp. 3289-3294). [Paris]: ELRA.

    Abstract

    Language resources can be divided into structural resources treating phonology, morphosyntax, semantics etc. and resources treating the social, demographic, ethnic, political context. A third type are meta-resources, like bibliographies, which provide access to the resources of the first two kinds. This poster will present the Glottolog/Langdoc project, a comprehensive bibliography providing web access to 180k bibliographical records to (mainly) low visibility resources from low-density languages. The resources are annotated for macro-area, content language, and document type and are available in XHTML and RDF.
  • Norris, D., Cutler, A., McQueen, J. M., Butterfield, S., & Kearns, R. K. (2000). Language-universal constraints on the segmentation of English. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP (Workshop on Spoken Word Access Processes) (pp. 43-46). Nijmegen: Max-Planck-Institute for Psycholinguistics.

    Abstract

    Two word-spotting experiments are reported that examine whether the Possible-Word Constraint (PWC) [1] is a language-specific or language-universal strategy for the segmentation of continuous speech. The PWC disfavours parses which leave an impossible residue between the end of a candidate word and a known boundary. The experiments examined cases where the residue was either a CV syllable with a lax vowel, or a CVC syllable with a schwa. Although neither syllable context is a possible word in English, word-spotting in both contexts was easier than with a context consisting of a single consonant. The PWC appears to be language-universal rather than language-specific.
  • Norris, D., Cutler, A., & McQueen, J. M. (2000). The optimal architecture for simulating spoken-word recognition. In C. Davis, T. Van Gelder, & R. Wales (Eds.), Cognitive Science in Australia, 2000: Proceedings of the Fifth Biennial Conference of the Australasian Cognitive Science Society. Adelaide: Causal Productions.

    Abstract

    Simulations explored the inability of the TRACE model of spoken-word recognition to model the effects on human listening of subcategorical mismatch in word forms. The source of TRACE's failure lay not in interactive connectivity, not in the presence of inter-word competition, and not in the use of phonemic representations, but in the need for continuously optimised interpretation of the input. When an analogue of TRACE was allowed to cycle to asymptote on every slice of input, an acceptable simulation of the subcategorical mismatch data was achieved. Even then, however, the simulation was not as close as that produced by the Merge model, which has inter-word competition, phonemic representations and continuous optimisation (but no interactive connectivity).
  • Offenga, F., Broeder, D., Wittenburg, P., Ducret, J., & Romary, L. (2006). Metadata profile in the ISO data category registry. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1866-1869).
  • Otake, T., & Cutler, A. (2000). A set of Japanese word cohorts rated for relative familiarity. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the Sixth International Conference on Spoken Language Processing: Vol. 3 (pp. 766-769). Beijing: China Military Friendship Publish.

    Abstract

    A database is presented of relative familiarity ratings for 24 sets of Japanese words, each set comprising words overlapping in the initial portions. These ratings are useful for the generation of material sets for research in the recognition of spoken words.
  • Ozturk, O., & Papafragou, A. (2008). Acquisition of evidentiality and source monitoring. In H. Chan, H. Jacob, & E. Kapia (Eds.), Proceedings from the 32nd Annual Boston University Conference on Language Development [BUCLD 32] (pp. 368-377). Somerville, Mass.: Cascadilla Press.
  • Ozyurek, A., & Ozcaliskan, S. (2000). How do children learn to conflate manner and path in their speech and gestures? Differences in English and Turkish. In E. V. Clark (Ed.), The proceedings of the Thirtieth Child Language Research Forum (pp. 77-85). Stanford: CSLI Publications.
  • Ozyurek, A., & Kita, S. (1999). Expressing manner and path in English and Turkish: Differences in speech, gesture, and conceptualization. In M. Hahn, & S. C. Stoness (Eds.), Proceedings of the Twenty-first Annual Conference of the Cognitive Science Society (pp. 507-512). London: Erlbaum.
  • Pallier, C., Cutler, A., & Sebastian-Galles, N. (1997). Prosodic structure and phonetic processing: A cross-linguistic study. In Proceedings of EUROSPEECH 97 (pp. 2131-2134). Grenoble, France: ESCA.

    Abstract

    Dutch and Spanish differ in how predictable the stress pattern is as a function of the segmental content: it is correlated with syllable weight in Dutch but not in Spanish. In the present study, two experiments were run to compare the abilities of Dutch and Spanish speakers to separately process segmental and stress information. It was predicted that the Spanish speakers would have more difficulty focusing on the segments and ignoring the stress pattern than the Dutch speakers. The task was a speeded classification task on CVCV syllables, with blocks of trials in which the stress pattern could vary versus blocks in which it was fixed. First, we found interference due to stress variability in both languages, suggesting that the processing of segmental information cannot be performed independently of stress. Second, the effect was larger for Spanish than for Dutch, suggesting that that the degree of interference from stress variation may be partially mitigated by the predictability of stress placement in the language.
  • Papafragou, A., & Ozturk, O. (2006). The acquisition of epistemic modality. In A. Botinis (Ed.), Proceedings of ITRW on Experimental Linguistics in ExLing-2006 (pp. 201-204). ISCA Archive.

    Abstract

    In this paper we try to contribute to the body of knowledge about the acquisition of English epistemic modal verbs (e.g. Mary may/has to be at school). Semantically, these verbs encode possibility or necessity with respect to available evidence. Pragmatically, the use of epistemic modals often gives rise to scalar conversational inferences (Mary may be at school -> Mary doesn’t have to be at school). The acquisition of epistemic modals is challenging for children on both these levels. In this paper, we present findings from two studies which were conducted with 5-year-old children and adults. Our findings, unlike previous work, show that 5-yr-olds have mastered epistemic modal semantics, including the notions of necessity and possibility. However, they are still in the process of acquiring epistemic modal pragmatics.
  • Pereiro Estevan, Y., Wan, V., Scharenborg, O., & Gallardo Antolín, A. (2006). Segmentación de fonemas no supervisada basada en métodos kernel de máximo margen. In Proceedings of IV Jornadas en Tecnología del Habla.

    Abstract

    En este artículo se desarrolla un método automático de segmentación de fonemas no supervisado. Este método utiliza el algoritmo de agrupación de máximo margen [1] para realizar segmentación de fonemas sobre habla continua sin necesidad de información a priori para el entrenamiento del sistema.
  • Petersson, K. M. (2008). On cognition, structured sequence processing, and adaptive dynamical systems. American Institute of Physics Conference Proceedings, 1060(1), 195-200.

    Abstract

    Cognitive neuroscience approaches the brain as a cognitive system: a system that functionally is conceptualized in terms of information processing. We outline some aspects of this concept and consider a physical system to be an information processing device when a subclass of its physical states can be viewed as representational/cognitive and transitions between these can be conceptualized as a process operating on these states by implementing operations on the corresponding representational structures. We identify a generic and fundamental problem in cognition: sequentially organized structured processing. Structured sequence processing provides the brain, in an essential sense, with its processing logic. In an approach addressing this problem, we illustrate how to integrate levels of analysis within a framework of adaptive dynamical systems. We note that the dynamical system framework lends itself to a description of asynchronous event-driven devices, which is likely to be important in cognition because the brain appears to be an asynchronous processing system. We use the human language faculty and natural language processing as a concrete example through out.
  • Pluymaekers, M., Ernestus, M., Baayen, R. H., & Booij, G. (2006). The role of morphology in fine phonetic detail: The case of Dutch -igheid. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 53-54).
  • Pluymaekers, M., Ernestus, M., & Baayen, R. H. (2006). Effects of word frequency on the acoustic durations of affixes. In Proceedings of Interspeech 2006 (pp. 953-956). Pittsburgh: ICSLP.

    Abstract

    This study investigates whether the acoustic durations of derivational affixes in Dutch are affected by the frequency of the word they occur in. In a word naming experiment, subjects were presented with a large number of words containing one of the affixes ge-, ver-, ont, or -lijk. Their responses were recorded on DAT tapes, and the durations of the affixes were measured using Automatic Speech Recognition technology. To investigate whether frequency also affected durations when speech rate was high, the presentation rate of the stimuli was varied. The results show that a higher frequency of the word as a whole led to shorter acoustic realizations for all affixes. Furthermore, affixes became shorter as the presentation rate of the stimuli increased. There was no interaction between word frequency and presentation rate, suggesting that the frequency effect also applies in situations in which the speed of articulation is very high.
  • Poellmann, K., McQueen, J. M., & Mitterer, H. (2012). How talker-adaptation helps listeners recognize reduced word-forms [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 2053.

    Abstract

    Two eye-tracking experiments tested whether native listeners can adapt
    to reductions in casual Dutch speech. Listeners were exposed to segmental
    ([b] > [m]), syllabic (full-vowel-deletion), or no reductions. In a subsequent
    test phase, all three listener groups were tested on how efficiently they could
    recognize both types of reduced words. In the first Experiment’s exposure
    phase, the (un)reduced target words were predictable. The segmental reductions
    were completely consistent (i.e., involved the same input sequences).
    Learning about them was found to be pattern-specific and generalized in the
    test phase to new reduced /b/-words. The syllabic reductions were not consistent
    (i.e., involved variable input sequences). Learning about them was
    weak and not pattern-specific. Experiment 2 examined effects of word repetition
    and predictability. The (un-)reduced test words appeared in the exposure
    phase and were not predictable. There was no evidence of learning for
    the segmental reductions, probably because they were not predictable during
    exposure. But there was word-specific learning for the vowel-deleted words.
    The results suggest that learning about reductions is pattern-specific and
    generalizes to new words if the input is consistent and predictable. With
    variable input, there is more likely to be adaptation to a general speaking
    style and word-specific learning.
  • Poletiek, F. H., & Chater, N. (2006). Grammar induction profits from representative stimulus sampling. In R. Sun (Ed.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006) (pp. 1968-1973). Austin, TX, USA: Cognitive Science Society.
  • Ravignani, A., & Fitch, W. T. (2012). Sonification of experimental parameters as a new method for efficient coding of behavior. In A. Spink, F. Grieco, O. E. Krips, L. W. S. Loijens, L. P. P. J. Noldus, & P. H. Zimmerman (Eds.), Measuring Behavior 2012, 8th International Conference on Methods and Techniques in Behavioral Research (pp. 376-379).

    Abstract

    Cognitive research is often focused on experimental condition-driven reactions. Ethological studies frequently
    rely on the observation of naturally occurring specific behaviors. In both cases, subjects are filmed during the
    study, so that afterwards behaviors can be coded on video. Coding should typically be blind to experimental
    conditions, but often requires more information than that present on video. We introduce a method for blindcoding
    of behavioral videos that takes care of both issues via three main innovations. First, of particular
    significance for playback studies, it allows creation of a “soundtrack” of the study, that is, a track composed of
    synthesized sounds representing different aspects of the experimental conditions, or other events, over time.
    Second, it facilitates coding behavior using this audio track, together with the possibly muted original video.
    This enables coding blindly to conditions as required, but not ignoring other relevant events. Third, our method
    makes use of freely available, multi-platform software, including scripts we developed.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). The strength of stress-related lexical competition depends on the presence of first-syllable stress. In Proceedings of Interspeech 2008 (pp. 1954-1954).

    Abstract

    Dutch listeners' looks to printed words were tracked while they listened to instructions to click with their mouse on one of them. When presented with targets from word pairs where the first two syllables were segmentally identical but differed in stress location, listeners used stress information to recognize the target before segmental information disambiguated the words. Furthermore, the amount of lexical competition was influenced by the presence or absence of word-initial stress.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). Lexical stress information modulates the time-course of spoken-word recognition. In Proceedings of Acoustics' 08 (pp. 3183-3188).

    Abstract

    Segmental as well as suprasegmental information is used by Dutch listeners to recognize words. The time-course of the effect of suprasegmental stress information on spoken-word recognition was investigated in a previous study, in which we tracked Dutch listeners' looks to arrays of four printed words as they listened to spoken sentences. Each target was displayed along with a competitor that did not differ segmentally in its first two syllables but differed in stress placement (e.g., 'CENtimeter' and 'sentiMENT'). The listeners' eye-movements showed that stress information is used to recognize the target before distinct segmental information is available. Here, we examine the role of durational information in this effect. Two experiments showed that initial-syllable duration, as a cue to lexical stress, is not interpreted dependent on the speaking rate of the preceding carrier sentence. This still held when other stress cues like pitch and amplitude were removed. Rather, the speaking rate of the preceding carrier affected the speed of word recognition globally, even though the rate of the target itself was not altered. Stress information modulated lexical competition, but did so independently of the rate of the preceding carrier, even if duration was the only stress cue present.
  • Roberts, L., & Meyer, A. S. (Eds.). (2012). Individual differences in second language acquisition [Special Issue]. Language Learning, 62(Supplement S2).
  • Robotham, L., Trinkler, I., & Sauter, D. (2008). The power of positives: Evidence for an overall emotional recognition deficit in Huntington's disease [Abstract]. Journal of Neurology, Neurosurgery & Psychiatry, 79, A12.

    Abstract

    The recognition of emotions of disgust, anger and fear have been shown to be significantly impaired in Huntington’s disease (eg,Sprengelmeyer et al, 1997, 2006; Gray et al, 1997; Milders et al, 2003,Montagne et al, 2006; Johnson et al, 2007; De Gelder et al, 2008). The relative impairment of these emotions might have implied a recognition impairment specific to negative emotions. Could the asymmetric recognition deficits be due not to the complexity of the emotion but rather reflect the complexity of the task? In the current study, 15 Huntington’s patients and 16 control subjects were presented with negative and positive non-speech emotional vocalisations that were to be identified as anger, fear, sadness, disgust, achievement, pleasure and amusement in a forced-choice paradigm. This experiment more accurately matched the negative emotions with positive emotions in a homogeneous modality. The resulting dually impaired ability of Huntington’s patients to identify negative and positive non-speech emotional vocalisations correctly provides evidence for an overall emotional recognition deficit in the disease. These results indicate that previous findings of a specificity in emotional recognition deficits might instead be due to the limitations of the visual modality. Previous experiments may have found an effect of emotional specificy due to the presence of a single positive emotion, happiness, in the midst of multiple negative emotions. In contrast with the previous literature, the study presented here points to a global deficit in the recognition of emotional sounds.
  • De Ruiter, L. E. (2008). How useful are polynomials for analyzing intonation? In Proceedings of Interspeech 2008 (pp. 785-789).

    Abstract

    This paper presents the first application of polynomial modeling as a means for validating phonological pitch accent labels to German data. It is compared to traditional phonetic analysis (measuring minima, maxima, alignment). The traditional method fares better in classification, but results are comparable in statistical accent pair testing. Robustness tests show that pitch correction is necessary in both cases. The approaches are discussed in terms of their practicability, applicability to other domains of research and interpretability of their results.
  • Sauter, D., Eisner, F., Rosen, S., & Scott, S. K. (2008). The role of source and filter cues in emotion recognition in speech [Abstract]. Journal of the Acoustical Society of America, 123, 3739-3740.

    Abstract

    In the context of the source-filter theory of speech, it is well established that intelligibility is heavily reliant on information carried by the filter, that is, spectral cues (e.g., Faulkner et al., 2001; Shannon et al., 1995). However, the extraction of other types of information in the speech signal, such as emotion and identity, is less well understood. In this study we investigated the extent to which emotion recognition in speech depends on filterdependent cues, using a forced-choice emotion identification task at ten levels of noise-vocoding ranging between one and 32 channels. In addition, participants performed a speech intelligibility task with the same stimuli. Our results indicate that compared to speech intelligibility, emotion recognition relies less on spectral information and more on cues typically signaled by source variations, such as voice pitch, voice quality, and intensity. We suggest that, while the reliance on spectral dynamics is likely a unique aspect of human speech, greater phylogenetic continuity across species may be found in the communication of affect in vocalizations.
  • Sauter, D. (2008). The time-course of emotional voice processing [Abstract]. Neurocase, 14, 455-455.

    Abstract

    Research using event-related brain potentials (ERPs) has demonstrated an early differential effect in fronto-central regions when processing emotional, as compared to affectively neutral facial stimuli (e.g., Eimer & Holmes, 2002). In this talk, data demonstrating a similar effect in the auditory domain will be presented. ERPs were recorded in a one-back task where participants had to identify immediate repetitions of emotion category, such as a fearful sound followed by another fearful sound. The stimulus set consisted of non-verbal emotional vocalisations communicating positive and negative sounds, as well as neutral baseline conditions. Similarly to the facial domain, fear sounds as compared to acoustically controlled neutral sounds, elicited a frontally distributed positivity with an onset latency of about 150 ms after stimulus onset. These data suggest the existence of a rapid multi-modal frontocentral mechanism discriminating emotional from non-emotional human signals.
  • Scharenborg, O., Wan, V., & Moore, R. K. (2006). Capturing fine-phonetic variation in speech through automatic classification of articulatory features. In Speech Recognition and Intrinsic Variation Workshop [SRIV2006] (pp. 77-82). ISCA Archive.

    Abstract

    The ultimate goal of our research is to develop a computational model of human speech recognition that is able to capture the effects of fine-grained acoustic variation on speech recognition behaviour. As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. In the experiments reported here, we compared support vector machines (SVMs) with multilayer perceptrons (MLPs). MLPs have been widely (and rather successfully) used for the task of multi-value articulatory feature classification, while (to the best of our knowledge) SVMs have not. This paper compares the performances of the two classifiers and analyses the results in order to better understand the articulatory representations. It was found that the MLPs outperformed the SVMs, but it is concluded that both classifiers exhibit similar behaviour in terms of patterns of errors.
  • Scharenborg, O., & Cooke, M. P. (2008). Comparing human and machine recognition performance on a VCV corpus. In ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery".

    Abstract

    Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and without ‘pitch’ information, using the same backend. The results are compared with listener results at the level of articulatory feature classification. While no acoustic feature representation reached the levels of human performance, both MFCCs and Rate maps achieved good scores, with Rate maps nearing human performance on the classification of voicing. Comparing the results on the most difficult articulatory features to classify showed similarities between the humans and the SVMs: e.g., ‘dental’ was by far the least well identified by both groups. Overall, adding pitch information seemed to hamper classification performance.
  • Scharenborg, O., Witteman, M. J., & Weber, A. (2012). Computational modelling of the recognition of foreign-accented speech. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 882 -885).

    Abstract

    In foreign-accented speech, pronunciation typically deviates from the canonical form to some degree. For native listeners, it has been shown that word recognition is more difficult for strongly-accented words than for less strongly-accented words. Furthermore recognition of strongly-accented words becomes easier with additional exposure to the foreign accent. In this paper, listeners’ behaviour was simulated with Fine-tracker, a computational model of word recognition that uses real speech as input. The simulations showed that, in line with human listeners, 1) Fine-Tracker’s recognition outcome is modulated by the degree of accentedness and 2) it improves slightly after brief exposure with the accent. On the level of individual words, however, Fine-tracker failed to correctly simulate listeners’ behaviour, possibly due to differences in overall familiarity with the chosen accent (German-accented Dutch) between human listeners and Fine-Tracker.
  • Scharenborg, O., Bouwman, G., & Boves, L. (2000). Connected digit recognition with class specific word models. In Proceedings of the COST249 Workshop on Voice Operated Telecom Services workshop (pp. 71-74).

    Abstract

    This work focuses on efficient use of the training material by selecting the optimal set of model topologies. We do this by training multiple word models of each word class, based on a subclassification according to a priori knowledge of the training material. We will examine classification criteria with respect to duration of the word, gender of the speaker, position of the word in the utterance, pauses in the vicinity of the word, and combinations of these. Comparative experiments were carried out on a corpus consisting of Dutch spoken connected digit strings and isolated digits, which are recorded in a wide variety of acoustic conditions. The results show, that classification based on gender of the speaker, position of the digit in the string, pauses in the vicinity of the training tokens, and models based on a combination of these criteria perform significantly better than the set with single models per digit.
  • Scharenborg, O. (2008). Modelling fine-phonetic detail in a computational model of word recognition. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1473-1476). ISCA Archive.

    Abstract

    There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal helps listeners to segment a speech signal into syllables and words. In this paper, we compare two computational models of word recognition on their ability to capture and use this finephonetic detail during speech recognition. One model, SpeM, is phoneme-based, whereas the other, newly developed Fine- Tracker, is based on articulatory features. Simulations dealt with modelling the ability of listeners to distinguish short words (e.g., ‘ham’) from the longer words in which they are embedded (e.g., ‘hamster’). The simulations with Fine- Tracker showed that it was, like human listeners, able to distinguish between short words from the longer words in which they are embedded. This suggests that it is possible to extract this fine-phonetic detail from the speech signal and use it during word recognition.
  • Scharenborg, O., & Janse, E. (2012). Hearing loss and the use of acoustic cues in phonetic categorisation of fricatives. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 1458-1461).

    Abstract

    Aging often affects sensitivity to the higher frequencies, which results in the loss of sensitivity to phonetic detail in speech. Hearing loss may therefore interfere with the categorisation of two consonants that have most information to differentiate between them in those higher frequencies and less in the lower frequencies, e.g., /f/ and /s/. We investigate two acoustic cues, i.e., formant transitions and fricative intensity, that older listeners might use to differentiate between /f/ and /s/. The results of two phonetic categorisation tasks on 38 older listeners (aged 60+) with varying degrees of hearing loss indicate that older listeners seem to use formant transitions as a cue to distinguish /s/ from /f/. Moreover, this ability is not impacted by hearing loss. On the other hand, listeners with increased hearing loss seem to rely more on intensity for fricative identification. Thus, progressive hearing loss may lead to gradual changes in perceptual cue weighting.
  • Scharenborg, O., Janse, E., & Weber, A. (2012). Perceptual learning of /f/-/s/ by older listeners. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 398-401).

    Abstract

    Young listeners can quickly modify their interpretation of a speech sound when a talker produces the sound ambiguously. Young Dutch listeners rely mainly on the higher frequencies to distinguish between /f/ and /s/, but these higher frequencies are particularly vulnerable to age-related hearing loss. We therefore tested whether older Dutch listeners can show perceptual retuning given an ambiguous pronunciation in between /f/ and /s/. Results of a lexically-guided perceptual learning experiment showed that older Dutch listeners are still able to learn non-standard pronunciations of /f/ and /s/. Possibly, the older listeners have learned to rely on other acoustic cues, such as formant transitions, to distinguish between /f/ and /s/. However, the size and duration of the perceptual effect is influenced by hearing loss, with listeners with poorer hearing showing a smaller and a shorter-lived learning effect.
  • Schiller, N. O., Van Lieshout, P. H. H. M., Meyer, A. S., & Levelt, W. J. M. (1997). Is the syllable an articulatory unit in speech production? Evidence from an Emma study. In P. Wille (Ed.), Fortschritte der Akustik: Plenarvorträge und Fachbeiträge der 23. Deutschen Jahrestagung für Akustik (DAGA 97) (pp. 605-606). Oldenburg: DEGA.
  • Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., & Sloetjes, H. (2008). An exchange format for multimodal annotations. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation of multimodality. We propose a multimodal annotation exchange format, based on the annotation graph formalism, which is supported by import and export routines in the respective tools
  • Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.

    Abstract

    This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.
  • Scott, S., & Sauter, D. (2006). Non-verbal expressions of emotion - acoustics, valence, and cross cultural factors. In Third International Conference on Speech Prosody 2006. ISCA.

    Abstract

    This presentation will address aspects of the expression of emotion in non-verbal vocal behaviour, specifically attempting to determine the roles of both positive and negative emotions, their acoustic bases, and the extent to which these are recognized in non-Western cultures.
  • Senft, G. (2000). COME and GO in Kilivila. In B. Palmer, & P. Geraghty (Eds.), SICOL. Proceedings of the second international conference on Oceanic linguistics: Volume 2, Historical and descriptive studies (pp. 105-136). Canberra: Pacific Linguistics.
  • Seuren, P. A. M. (1984). Logic and truth-values in language. In F. Landman, & F. Veltman (Eds.), Varieties of formal semantics: Proceedings of the fourth Amsterdam colloquium (pp. 343-364). Dordrecht: Foris.
  • Seuren, P. A. M., & Mufwene, S. S. (Eds.). (1990). Issues in Creole lingusitics [Special Issue]. Linguistics, 28(4).

Share this page