Publications

Displaying 101 - 200 of 232
  • Jesse, A., & Johnson, E. K. (2008). Audiovisual alignment in child-directed speech facilitates word learning. In Proceedings of the International Conference on Auditory-Visual Speech Processing (pp. 101-106). Adelaide, Aust: Causal Productions.

    Abstract

    Adult-to-child interactions are often characterized by prosodically-exaggerated speech accompanied by visually captivating co-speech gestures. In a series of adult studies, we have shown that these gestures are linked in a sophisticated manner to the prosodic structure of adults' utterances. In the current study, we use the Preferential Looking Paradigm to demonstrate that two-year-olds can use the alignment of these gestures to speech to deduce the meaning of words.
  • Johns, T. G., Perera, R. M., Vitali, A. A., Vernes, S. C., & Scott, A. (2004). Phosphorylation of a glioma-specific mutation of the EGFR [Abstract]. Neuro-Oncology, 6, 317.

    Abstract

    Mutations of the epidermal growth factor receptor (EGFR) gene are found at a relatively high frequency in glioma, with the most common being the de2-7 EGFR (or EGFRvIII). This mutation arises from an in-frame deletion of exons 2-7, which removes 267 amino acids from the extracellular domain of the receptor. Despite being unable to bind ligand, the de2-7 EGFR is constitutively active at a low level. Transfection of human glioma cells with the de2-7 EGFR has little effect in vitro, but when grown as tumor xenografts this mutated receptor imparts a dramatic growth advantage. We mapped the phosphorylation pattern of de2-7 EGFR, both in vivo and in vitro, using a panel of antibodies specific for different phosphorylated tyrosine residues. Phosphorylation of de2-7 EGFR was detected constitutively at all tyrosine sites surveyed in vitro and in vivo, including tyrosine 845, a known target in the wild-type EGFR for src kinase. There was a substantial upregulation of phosphorylation at every yrosine residue of the de2-7 EGFR when cells were grown in vivo compared to the receptor isolated from cells cultured in vitro. Upregulation of phosphorylation at tyrosine 845 could be stimulated in vitro by the addition of specific components of the ECM via an integrindependent mechanism. These observations may partially explain why the growth enhancement mediated by de2-7 EGFR is largely restricted to the in vivo environment
  • Johnson, E. K., Jusczyk, P. W., Cutler, A., & Norris, D. (2000). The development of word recognition: The use of the possible-word constraint by 12-month-olds. In L. Gleitman, & A. Joshi (Eds.), Proceedings of CogSci 2000 (pp. 1034). London: Erlbaum.
  • Kelly, S. D., & Ozyurek, A. (Eds.). (2007). Gesture, language, and brain [Special Issue]. Brain and Language, 101(3).
  • Kempen, G., & Harbusch, K. (1998). A 'tree adjoining' grammar without adjoining: The case of scrambling in German. In Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4).
  • Kempen, G., & Harbusch, K. (2004). How flexible is constituent order in the midfield of German subordinate clauses? A corpus study revealing unexpected rigidity. In S. Kepser, & M. Reis (Eds.), Pre-Proceedings of the International Conference on Linguistic Evidence (pp. 81-85). Tübingen: Niemeyer.
  • Kempen, G. (2004). Interactive visualization of syntactic structure assembly for grammar-intensive first- and second-language instruction. In R. Delmonte, P. Delcloque, & S. Tonelli (Eds.), Proceedings of InSTIL/ICALL2004 Symposium on NLP and speech technologies in advanced language learning systems (pp. 183-186). Venice: University of Venice.
  • Kempen, G., & Harbusch, K. (2004). How flexible is constituent order in the midfield of German subordinate clauses?: A corpus study revealing unexpected rigidity. In Proceedings of the International Conference on Linguistic Evidence (pp. 81-85). Tübingen: University of Tübingen.
  • Kempen, G. (2004). Human grammatical coding: Shared structure formation resources for grammatical encoding and decoding. In Cuny 2004 - The 17th Annual CUNY Conference on Human Sentence Processing. March 25-27, 2004. University of Maryland (pp. 66).
  • Kempen, G. (1996). Human language technology can modernize writing and grammar instruction. In COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2 (pp. 1005-1006). Stroudsburg, PA: Association for Computational Linguistics.
  • Kempen, G., & Hoenkamp, E. (1982). Incremental sentence generation: Implications for the structure of a syntactic processor. In J. Horecký (Ed.), COLING 82. Proceedings of the Ninth International Conference on Computational Linguistics, Prague, July 5-10, 1982 (pp. 151-156). Amsterdam: North-Holland.

    Abstract

    Human speakers often produce sentences incrementally. They can start speaking having in mind only a fragmentary idea of what they want to say, and while saying this they refine the contents underlying subsequent parts of the utterance. This capability imposes a number of constraints on the design of a syntactic processor. This paper explores these constraints and evaluates some recent computational sentence generators from the perspective of incremental production.
  • Kempen, G., & Janssen, S. (1996). Omspellen: Reuze(n)karwei of peule(n)schil? In H. Croll, & J. Creutzberg (Eds.), Proceedings of the 5e Dag van het Document (pp. 143-146). Projectbureau Croll en Creutzberg.
  • Kemps-Snijders, M., Klassmann, A., Zinn, C., Berck, P., Russel, A., & Wittenburg, P. (2008). Exploring and enriching a language resource archive via the web. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    The ”download first, then process paradigm” is still the predominant working method amongst the research community. The web-based paradigm, however, offers many advantages from a tool development and data management perspective as they allow a quick adaptation to changing research environments. Moreover, new ways of combining tools and data are increasingly becoming available and will eventually enable a true web-based workflow approach, thus challenging the ”download first, then process” paradigm. The necessary infrastructure for managing, exploring and enriching language resources via the Web will need to be delivered by projects like CLARIN and DARIAH
  • Kemps-Snijders, M., Zinn, C., Ringersma, J., & Windhouwer, M. (2008). Ensuring semantic interoperability on lexical resources. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    In this paper, we describe a unifying approach to tackle data heterogeneity issues for lexica and related resources. We present LEXUS, our software that implements the Lexical Markup Framework (LMF) to uniformly describe and manage lexica of different structures. LEXUS also makes use of a central Data Category Registry (DCR) to address terminological issues with regard to linguistic concepts as well as the handling of working and object languages. Finally, we report on ViCoS, a LEXUS extension, providing support for the definition of arbitrary semantic relations between lexical entries or parts thereof.
  • Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., & Wright, S. E. (2008). ISOcat: Corralling data categories in the wild. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    To achieve true interoperability for valuable linguistic resources different levels of variation need to be addressed. ISO Technical Committee 37, Terminology and other language and content resources, is developing a Data Category Registry. This registry will provide a reusable set of data categories. A new implementation, dubbed ISOcat, of the registry is currently under construction. This paper shortly describes the new data model for data categories that will be introduced in this implementation. It goes on with a sketch of the standardization process. Completed data categories can be reused by the community. This is done by either making a selection of data categories using the ISOcat web interface, or by other tools which interact with the ISOcat system using one of its various Application Programming Interfaces. Linguistic resources that use data categories from the registry should include persistent references, e.g. in the metadata or schemata of the resource, which point back to their origin. These data category references can then be used to determine if two or more resources share common semantics, thus providing a level of interoperability close to the source data and a promising layer for semantic alignment on higher levels
  • Khemlani, S., Leslie, S.-J., Glucksberg, S., & Rubio-Fernández, P. (2007). Do ducks lay eggs? How people interpret generic assertions. In D. S. McNamara, & J. G. Trafton (Eds.), Proceedings of the 29th Annual Conference of the Cognitive Science Society (CogSci 2007). Austin, TX: Cognitive Science Society.
  • Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In Gesture and Sign-Language in Human-Computer Interaction (Lecture Notes in Artificial Intelligence - LNCS Subseries, Vol. 1371) (pp. 23-35). Berlin, Germany: Springer-Verlag.

    Abstract

    The previous literature has suggested that the hand movement in co-speech gestures and signs consists of a series of phases with qualitatively different dynamic characteristics. In this paper, we propose a syntagmatic rule system for movement phases that applies to both co-speech gestures and signs. Descriptive criteria for the rule system were developed for the analysis video-recorded continuous production of signs and gesture. It involves segmenting a stream of body movement into phases and identifying different phase types. Two human coders used the criteria to analyze signs and cospeech gestures that are produced in natural discourse. It was found that the criteria yielded good inter-coder reliability. These criteria can be used for the technology of automatic recognition of signs and co-speech gestures in order to segment continuous production and identify the potentially meaningbearing phase.
  • Klein, W. (Ed.). (2004). Philologie auf neuen Wegen [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 136.
  • Klein, W., & Von Stutterheim, C. (Eds.). (2007). Sprachliche Perspektivierung [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 145.
  • Klein, W. (Ed.). (2004). Universitas [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik (LiLi), 134.
  • Klein, W. (2000). Changing concepts of the nature-nurture debate. In R. Hide, J. Mittelstrass, & W. Singer (Eds.), Changing concepts of nature at the turn of the millenium: Proceedings plenary session of the Pontifical academy of sciences, 26-29 October 1998 (pp. 289-299). Vatican City: Pontificia Academia Scientiarum.
  • Klein, W., & Schnell, R. (Eds.). (2008). Literaturwissenschaft und Linguistik [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (150).
  • Klein, W. (Ed.). (2008). Ist Schönheit messbar? [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 152.
  • Klein, W. (Ed.). (1998). Kaleidoskop [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (112).
  • Klein, W. (Ed.). (1992). Textlinguistik [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (86).
  • Klein, W. (Ed.). (2000). Sprache des Rechts [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (118).
  • Klein, W., & Schlieben-Lange, B. (Eds.). (1996). Sprache und Subjektivität I [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (101).
  • Klein, W., & Schlieben-Lange, B. (Eds.). (1996). Sprache und Subjektivität II [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (102).
  • Klein, W. (Ed.). (1996). Zweitspracherwerb [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (104).
  • Klein, W. (Ed.). (1982). Zweitspracherwerb [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (45).
  • Kuijpers, C., Van Donselaar, W., & Cutler, A. (1996). Phonological variation: Epenthesis and deletion of schwa in Dutch. In H. T. Bunnell (Ed.), Proceedings of the Fourth International Conference on Spoken Language Processing: Vol. 1 (pp. 94-97). New York: Institute of Electrical and Electronics Engineers.

    Abstract

    Two types of phonological variation in Dutch, resulting from optional rules, are schwa epenthesis and schwa deletion. In a lexical decision experiment it was investigated whether the phonological variants were processed similarly to the standard forms. It was found that the two types of variation patterned differently. Words with schwa epenthesis were processed faster and more accurately than the standard forms, whereas words with schwa deletion led to less fast and less accurate responses. The results are discussed in relation to the role of consonant-vowel alternations in speech processing and the perceptual integrity of onset clusters.
  • Kuzla, C., & Ernestus, M. (2007). Prosodic conditioning of phonetic detail of German plosives. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 461-464). Dudweiler: Pirrot.

    Abstract

    The present study investigates the influence of prosodic structure on the fine-grained phonetic details of German plosives which also cue the phonological fortis-lenis contrast. Closure durations were found to be longer at higher prosodic boundaries. There was also less glottal vibration in lenis plosives at higher prosodic boundaries. Voice onset time in lenis plosives was not affected by prosody. In contrast, for the fortis plosives VOT decreased at higher boundaries, as did the maximal intensity of the release. These results demonstrate that the effects of prosody on different phonetic cues can go into opposite directions, but are overall constrained by the need to maintain phonological contrasts. While prosodic effects on some cues are compatible with a ‘fortition’ account of prosodic strengthening or with a general feature enhancement explanation, the effects on others enhance paradigmatic contrasts only within a given prosodic position.
  • Lai, V. T., Chang, M., Duffield, C., Hwang, J., Xue, N., & Palmer, M. (2007). Defining a methodology for mapping Chinese and English sense inventories. In Proceedings of the 8th Chinese Lexical Semantics Workshop 2007 (CLSW 2007). The Hong Kong Polytechnic University, Hong Kong, May 21-23 (pp. 59-65).

    Abstract

    In this study, we explored methods for linking Chinese and English sense inventories using two opposing approaches: creating links (1) bottom-up: by starting at the finer-grained sense level then proceeding to the verb subcategorization frames and (2) top-down: by starting directly with the more coarse-grained frame levels. The sense inventories for linking include pre-existing corpora, such as English Propbank (Palmer, Gildea, and Kingsbury, 2005), Chinese Propbank (Xue and Palmer, 2004) and English WordNet (Fellbaum, 1998) and newly created corpora, the English and Chinese Sense Inventories from DARPA-GALE OntoNotes. In the linking task, we selected a group of highly frequent and polysemous communication verbs, including say, ask, talk, and speak in English, and shuo, biao-shi, jiang, and wen in Chinese. We found that with the bottom-up method, although speakers of both languages agreed on the links between senses, the subcategorization frames of the corresponding senses did not match consistently. With the top-down method, if the verb frames match in both languages, their senses line up more quickly to each other. The results indicate that the top-down method is more promising in linking English and Chinese sense inventories.
  • Lansner, A., Sandberg, A., Petersson, K. M., & Ingvar, M. (2000). On forgetful attractor network memories. In H. Malmgren, M. Borga, & L. Niklasson (Eds.), Artificial neural networks in medicine and biology: Proceedings of the ANNIMAB-1 Conference, Göteborg, Sweden, 13-16 May 2000 (pp. 54-62). Heidelberg: Springer Verlag.

    Abstract

    A recurrently connected attractor neural network with a Hebbian learning rule is currently our best ANN analogy for a piece cortex. Functionally biological memory operates on a spectrum of time scales with regard to induction and retention, and it is modulated in complex ways by sub-cortical neuromodulatory systems. Moreover, biological memory networks are commonly believed to be highly distributed and engage many co-operating cortical areas. Here we focus on the temporal aspects of induction and retention of memory in a connectionist type attractor memory model of a piece of cortex. A continuous time, forgetful Bayesian-Hebbian learning rule is described and compared to the characteristics of LTP and LTD seen experimentally. More generally, an attractor network implementing this learning rule can operate as a long-term, intermediate-term, or short-term memory. Modulation of the print-now signal of the learning rule replicates some experimental memory phenomena, like e.g. the von Restorff effect.
  • Lenkiewicz, P., Pereira, M., Freire, M., & Fernandes, J. (2008). Accelerating 3D medical image segmentation with high performance computing. In Proceedings of the IEEE International Workshops on Image Processing Theory, Tools and Applications - IPT (pp. 1-8).

    Abstract

    Digital processing of medical images has helped physicians and patients during past years by allowing examination and diagnosis on a very precise level. Nowadays possibly the biggest deal of support it can offer for modern healthcare is the use of high performance computing architectures to treat the huge amounts of data that can be collected by modern acquisition devices. This paper presents a parallel processing implementation of an image segmentation algorithm that operates on a computer cluster equipped with 10 processing units. Thanks to well-organized distribution of the workload we manage to significantly shorten the execution time of the developed algorithm and reach a performance gain very close to linear.
  • De León, L., & Levinson, S. C. (Eds.). (1992). Space in Mesoamerican languages [Special Issue]. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung, 45(6).
  • Levelt, W. J. M. (1991). Lexical access in speech production: Stages versus cascading. In H. Peters, W. Hulstijn, & C. Starkweather (Eds.), Speech motor control and stuttering (pp. 3-10). Amsterdam: Excerpta Medica.
  • Levinson, S. C. (2000). Language as nature and language as art. In J. Mittelstrass, & W. Singer (Eds.), Proceedings of the Symposium on ‘Changing concepts of nature and the turn of the Millennium (pp. 257-287). Vatican City: Pontificae Academiae Scientiarium Scripta Varia.
  • Levinson, S. C. (2000). H.P. Grice on location on Rossel Island. In S. S. Chang, L. Liaw, & J. Ruppenhofer (Eds.), Proceedings of the 25th Annual Meeting of the Berkeley Linguistic Society (pp. 210-224). Berkeley: Berkeley Linguistic Society.
  • Lucas, C., Griffiths, T., Xu, F., & Fawcett, C. (2008). A rational model of preference learning and choice prediction by children. In D. Koller, Y. Bengio, D. Schuurmans, L. Bottou, & A. Culotta (Eds.), Advances in Neural Information Processing Systems.

    Abstract

    Young children demonstrate the ability to make inferences about the preferences of other agents based on their choices. However, there exists no overarching account of what children are doing when they learn about preferences or how they use that knowledge. We use a rational model of preference learning, drawing on ideas from economics and computer science, to explain the behavior of children in several recent experiments. Specifically, we show how a simple econometric model can be extended to capture two- to four-year-olds’ use of statistical information in inferring preferences, and their generalization of these preferences.
  • Magyari, L., & De Ruiter, J. P. (2008). Timing in conversation: The anticipation of turn endings. In J. Ginzburg, P. Healey, & Y. Sato (Eds.), Proceedings of the 12th Workshop on the Semantics and Pragmatics Dialogue (pp. 139-146). London: King's college.

    Abstract

    We examined how communicators can switch between speaker and listener role with such accurate timing. During conversations, the majority of role transitions happens with a gap or overlap of only a few hundred milliseconds. This suggests that listeners can predict when the turn of the current speaker is going to end. Our hypothesis is that listeners know when a turn ends because they know how it ends. Anticipating the last words of a turn can help the next speaker in predicting when the turn will end, and also in anticipating the content of the turn, so that an appropriate response can be prepared in advance. We used the stimuli material of an earlier experiment (De Ruiter, Mitterer & Enfield, 2006), in which subjects were listening to turns from natural conversations and had to press a button exactly when the turn they were listening to ended. In the present experiment, we investigated if the subjects can complete those turns when only an initial fragment of the turn is presented to them. We found that the subjects made better predictions about the last words of those turns that had more accurate responses in the earlier button press experiment.
  • Majid, A., Van Staden, M., & Enfield, N. J. (2004). The human body in cognition, brain, and typology. In K. Hovie (Ed.), Forum Handbook, 4th International Forum on Language, Brain, and Cognition - Cognition, Brain, and Typology: Toward a Synthesis (pp. 31-35). Sendai: Tohoku University.

    Abstract

    The human body is unique: it is both an object of perception and the source of human experience. Its universality makes it a perfect resource for asking questions about how cognition, brain and typology relate to one another. For example, we can ask how speakers of different languages segment and categorize the human body. A dominant view is that body parts are “given” by visual perceptual discontinuities, and that words are merely labels for these visually determined parts (e.g., Andersen, 1978; Brown, 1976; Lakoff, 1987). However, there are problems with this view. First it ignores other perceptual information, such as somatosensory and motoric representations. By looking at the neural representations of sesnsory representations, we can test how much of the categorization of the human body can be done through perception alone. Second, we can look at language typology to see how much universality and variation there is in body-part categories. A comparison of a range of typologically, genetically and areally diverse languages shows that the perceptual view has only limited applicability (Majid, Enfield & van Staden, in press). For example, using a “coloring-in” task, where speakers of seven different languages were given a line drawing of a human body and asked to color in various body parts, Majid & van Staden (in prep) show that languages vary substantially in body part segmentation. For example, Jahai (Mon-Khmer) makes a lexical distinction between upper arm, lower arm, and hand, but Lavukaleve (Papuan Isolate) has just one word to refer to arm, hand, and leg. This shows that body part categorization is not a straightforward mapping of words to visually determined perceptual parts.
  • Majid, A., & Bowerman, M. (Eds.). (2007). Cutting and breaking events: A crosslinguistic perspective [Special Issue]. Cognitive Linguistics, 18(2).

    Abstract

    This special issue of Cognitive Linguistics explores the linguistic encoding of events of cutting and breaking. In this article we first introduce the project on which it is based by motivating the selection of this conceptual domain, presenting the methods of data collection used by all the investigators, and characterizing the language sample. We then present a new approach to examining crosslinguistic similarities and differences in semantic categorization. Applying statistical modeling to the descriptions of cutting and breaking events elicited from speakers of all the languages, we show that although there is crosslinguistic variation in the number of distinctions made and in the placement of category boundaries, these differences take place within a strongly constrained semantic space: across languages, there is a surprising degree of consensus on the partitioning of events in this domain. In closing, we compare our statistical approach with more conventional semantic analyses, and show how an extensional semantic typological approach like the one illustrated here can help illuminate the intensional distinctions made by languages.
  • Majid, A., Van Staden, M., Boster, J. S., & Bowerman, M. (2004). Event categorization: A cross-linguistic perspective. In K. Forbus, D. Gentner, & T. Tegier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 885-890). Mahwah, NJ: Erlbaum.

    Abstract

    Many studies in cognitive science address how people categorize objects, but there has been comparatively little research on event categorization. This study investigated the categorization of events involving material destruction, such as “cutting” and “breaking”. Speakers of 28 typologically, genetically, and areally diverse languages described events shown in a set of video-clips. There was considerable cross-linguistic agreement in the dimensions along which the events were distinguished, but there was variation in the number of categories and the placement of their boundaries.
  • Malaisé, V., Gazendam, L., & Brugman, H. (2007). Disambiguating automatic semantic annotation based on a thesaurus structure. In Proceedings of TALN 2007.
  • Matsuo, A. (2004). Young children's understanding of ongoing vs. completion in present and perfective participles. In J. v. Kampen, & S. Baauw (Eds.), Proceedings of GALA 2003 (pp. 305-316). Utrecht: Netherlands Graduate School of Linguistics (LOT).
  • McCafferty, S. G., & Gullberg, M. (Eds.). (2008). Gesture and SLA: Toward an integrated approach [Special Issue]. Studies in Second Language Acquisition, 30(2).
  • McQueen, J. M., & Cutler, A. (1998). Spotting (different kinds of) words in (different kinds of) context. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2791-2794). Sydney: ICSLP.

    Abstract

    The results of a word-spotting experiment are presented in which Dutch listeners tried to spot different types of bisyllabic Dutch words embedded in different types of nonsense contexts. Embedded verbs were not reliably harder to spot than embedded nouns; this suggests that nouns and verbs are recognised via the same basic processes. Iambic words were no harder to spot than trochaic words, suggesting that trochaic words are not in principle easier to recognise than iambic words. Words were harder to spot in consonantal contexts (i.e., contexts which themselves could not be words) than in longer contexts which contained at least one vowel (i.e., contexts which, though not words, were possible words of Dutch). A control experiment showed that this difference was not due to acoustic differences between the words in each context. The results support the claim that spoken-word recognition is sensitive to the viability of sound sequences as possible words.
  • McQueen, J. M., Cutler, A., & Norris, D. (2000). Positive and negative influences of the lexicon on phonemic decision-making. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the Sixth International Conference on Spoken Language Processing: Vol. 3 (pp. 778-781). Beijing: China Military Friendship Publish.

    Abstract

    Lexical knowledge influences how human listeners make decisions about speech sounds. Positive lexical effects (faster responses to target sounds in words than in nonwords) are robust across several laboratory tasks, while negative effects (slower responses to targets in more word-like nonwords than in less word-like nonwords) have been found in phonetic decision tasks but not phoneme monitoring tasks. The present experiments tested whether negative lexical effects are therefore a task-specific consequence of the forced choice required in phonetic decision. We compared phoneme monitoring and phonetic decision performance using the same Dutch materials in each task. In both experiments there were positive lexical effects, but no negative lexical effects. We observe that in all studies showing negative lexical effects, the materials were made by cross-splicing, which meant that they contained perceptual evidence supporting the lexically-consistent phonemes. Lexical knowledge seems to influence phonemic decision-making only when there is evidence for the lexically-consistent phoneme in the speech signal.
  • McQueen, J. M., Cutler, A., & Norris, D. (2000). Why Merge really is autonomous and parsimonious. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP (Workshop on Spoken Word Access Processes) (pp. 47-50). Nijmegen: Max-Planck-Institute for Psycholinguistics.

    Abstract

    We briefly describe the Merge model of phonemic decision-making, and, in the light of general arguments about the possible role of feedback in spoken-word recognition, defend Merge's feedforward structure. Merge not only accounts adequately for the data, without invoking feedback connections, but does so in a parsimonious manner.
  • McQueen, J. M., & Cutler, A. (1992). Words within words: Lexical statistics and lexical access. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing: Vol. 1 (pp. 221-224). Alberta: University of Alberta.

    Abstract

    This paper presents lexical statistics on the pattern of occurrence of words embedded in other words. We report the results of an analysis of 25000 words, varying in length from two to six syllables, extracted from a phonetically-coded English dictionary (The Longman Dictionary of Contemporary English). Each syllable, and each string of syllables within each word was checked against the dictionary. Two analyses are presented: the first used a complete list of polysyllables, with look-up on the entire dictionary; the second used a sublist of content words, counting only embedded words which were themselves content words. The results have important implications for models of human speech recognition. The efficiency of these models depends, in different ways, on the number and location of words within words.
  • Mitterer, H. (2007). Top-down effects on compensation for coarticulation are not replicable. In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 1601-1604). Adelaide: Causal Productions.

    Abstract

    Listeners use lexical knowledge to judge what speech sounds they heard. I investigated whether such lexical influences are truly top-down or just reflect a merging of perceptual and lexical constraints. This is achieved by testing whether the lexically determined identity of a phone exerts the appropriate context effects on surrounding phones. The current investigations focuses on compensation for coarticulation in vowel-fricative sequences, where the presence of a rounded vowel (/y/ rather than /i/) leads fricatives to be perceived as /s/ rather than //. This results was consistently found in all three experiments. A vowel was also more likely to be perceived as rounded /y/ if that lead listeners to be perceive words rather than nonwords (Dutch: meny, English id. vs. meni nonword). This lexical influence on the perception of the vowel had, however, no consistent influence on the perception of following fricative.
  • Mitterer, H., & McQueen, J. M. (2007). Tracking perception of pronunciation variation by tracking looks to printed words: The case of word-final /t/. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 1929-1932). Dudweiler: Pirrot.

    Abstract

    We investigated perception of words with reduced word-final /t/ using an adapted eyetracking paradigm. Dutch listeners followed spoken instructions to click on printed words which were accompanied on a computer screen by simple shapes (e.g., a circle). Targets were either above or next to their shapes, and the shapes uniquely identified the targets when the spoken forms were ambiguous between words with or without final /t/ (e.g., bult, bump, vs. bul, diploma). Analysis of listeners’ eye-movements revealed, in contrast to earlier results, that listeners use the following segmental context when compensating for /t/-reduction. Reflecting that /t/-reduction is more likely to occur before bilabials, listeners were more likely to look at the /t/-final words if the next word’s first segment was bilabial. This result supports models of speech perception in which prelexical phonological processes use segmental context to modulate word recognition.
  • Mitterer, H. (2007). Behavior reflects the (degree of) reality of phonological features in the brain as well. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 127-130). Dudweiler: Pirrot.

    Abstract

    To assess the reality of phonological features in language processing (vs. language description), one needs to specify the distinctive claims of distinctive-feature theory. Two of the more farreaching claims are compositionality and generalizability. I will argue that there is some evidence for the first and evidence against the second claim from a recent behavioral paradigm. Highlighting the contribution of a behavioral paradigm also counterpoints the use of brain measures as the only way to elucidate what is "real for the brain". The contributions of the speakers exemplify how brain measures can help us to understand the reality of phonological features in language processing. The evidence is, however, not convincing for a) the claim for underspecification of phonological features—which has to deal with counterevidence from behavioral as well as brain measures—, and b) the claim of position independence of phonological features.
  • Mitterer, H. (2008). How are words reduced in spontaneous speech? In A. Botonis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (pp. 165-168). Athens: University of Athens.

    Abstract

    Words are reduced in spontaneous speech. If reductions are constrained by functional (i.e., perception and production) constraints, they should not be arbitrary. This hypothesis was tested by examing the pronunciations of high- to mid-frequency words in a Dutch and a German spontaneous speech corpus. In logistic-regression models the "reduction likelihood" of a phoneme was predicted by fixed-effect predictors such as position within the word, word length, word frequency, and stress, as well as random effects such as phoneme identity and word. The models for Dutch and German show many communalities. This is in line with the assumption that similar functional constraints influence reductions in both languages.
  • Narasimhan, B., Eisenbeiss, S., & Brown, P. (Eds.). (2007). The linguistic encoding of multiple-participant events [Special Issue]. Linguistics, 45(3).

    Abstract

    This issue investigates the linguistic encoding of events with three or more participants from the perspectives of language typology and acquisition. Such “multiple-participant events” include (but are not limited to) any scenario involving at least three participants, typically encoded using transactional verbs like 'give' and 'show', placement verbs like 'put', and benefactive and applicative constructions like 'do (something for someone)', among others. There is considerable crosslinguistic and withinlanguage variation in how the participants (the Agent, Causer, Theme, Goal, Recipient, or Experiencer) and the subevents involved in multipleparticipant situations are encoded, both at the lexical and the constructional levels
  • Norris, D., Cutler, A., McQueen, J. M., Butterfield, S., & Kearns, R. K. (2000). Language-universal constraints on the segmentation of English. In A. Cutler, J. M. McQueen, & R. Zondervan (Eds.), Proceedings of SWAP (Workshop on Spoken Word Access Processes) (pp. 43-46). Nijmegen: Max-Planck-Institute for Psycholinguistics.

    Abstract

    Two word-spotting experiments are reported that examine whether the Possible-Word Constraint (PWC) [1] is a language-specific or language-universal strategy for the segmentation of continuous speech. The PWC disfavours parses which leave an impossible residue between the end of a candidate word and a known boundary. The experiments examined cases where the residue was either a CV syllable with a lax vowel, or a CVC syllable with a schwa. Although neither syllable context is a possible word in English, word-spotting in both contexts was easier than with a context consisting of a single consonant. The PWC appears to be language-universal rather than language-specific.
  • Norris, D., Van Ooijen, B., & Cutler, A. (1992). Speeded detection of vowels and steady-state consonants. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing; Vol. 2 (pp. 1055-1058). Alberta: University of Alberta.

    Abstract

    We report two experiments in which vowels and steady-state consonants served as targets in a speeded detection task. In the first experiment, two vowels were compared with one voiced and once unvoiced fricative. Response times (RTs) to the vowels were longer than to the fricatives. The error rate was higher for the consonants. Consonants in word-final position produced the shortest RTs, For the vowels, RT correlated negatively with target duration. In the second experiment, the same two vowel targets were compared with two nasals. This time there was no significant difference in RTs, but the error rate was still significantly higher for the consonants. Error rate and length correlated negatively for the vowels only. We conclude that RT differences between phonemes are independent of vocalic or consonantal status. Instead, we argue that the process of phoneme detection reflects more finely grained differences in acoustic/articulatory structure within the phonemic repertoire.
  • Norris, D., Cutler, A., & McQueen, J. M. (2000). The optimal architecture for simulating spoken-word recognition. In C. Davis, T. Van Gelder, & R. Wales (Eds.), Cognitive Science in Australia, 2000: Proceedings of the Fifth Biennial Conference of the Australasian Cognitive Science Society. Adelaide: Causal Productions.

    Abstract

    Simulations explored the inability of the TRACE model of spoken-word recognition to model the effects on human listening of subcategorical mismatch in word forms. The source of TRACE's failure lay not in interactive connectivity, not in the presence of inter-word competition, and not in the use of phonemic representations, but in the need for continuously optimised interpretation of the input. When an analogue of TRACE was allowed to cycle to asymptote on every slice of input, an acceptable simulation of the subcategorical mismatch data was achieved. Even then, however, the simulation was not as close as that produced by the Merge model, which has inter-word competition, phonemic representations and continuous optimisation (but no interactive connectivity).
  • Omar, R., Henley, S. M., Hailstone, J. C., Sauter, D., Scott, S. K., Fox, N. C., Rossor, M. N., & Warren, J. D. (2007). Recognition of emotions in faces, voices and music in frontotemporal lobar regeneration [Abstract]. Journal of Neurology, Neurosurgery & Psychiatry, 78(9), 1014.

    Abstract

    Frontotemporal lobar degeneration (FTLD) is a group of neurodegenerative conditions characterised by focal frontal and/or temporal lobe atrophy. Patients develop a range of cognitive and behavioural abnormalities, including prominent difficulties in comprehending and expressing emotions, with significant clinical and social consequences. Here we report a systematic prospective analysis of emotion processing in different input modalities in patients with FTLD. We examined recognition of happiness, sadness, fear and anger in facial expressions, non-verbal vocalisations and music in patients with FTLD and in healthy age matched controls. The FTLD group was significantly impaired in all modalities compared with controls, and this effect was most marked for music. Analysing each emotion separately, recognition of negative emotions was impaired in all three modalities in FTLD, and this effect was most marked for fear and anger. Recognition of happiness was deficient only with music. Our findings support the idea that FTLD causes impaired recognition of emotions across input channels, consistent with a common central representation of emotion concepts. Music may be a sensitive probe of emotional deficits in FTLD, perhaps because it requires a more abstract representation of emotion than do animate stimuli such as faces and voices.
  • Otake, T., & Cutler, A. (2000). A set of Japanese word cohorts rated for relative familiarity. In B. Yuan, T. Huang, & X. Tang (Eds.), Proceedings of the Sixth International Conference on Spoken Language Processing: Vol. 3 (pp. 766-769). Beijing: China Military Friendship Publish.

    Abstract

    A database is presented of relative familiarity ratings for 24 sets of Japanese words, each set comprising words overlapping in the initial portions. These ratings are useful for the generation of material sets for research in the recognition of spoken words.
  • Ozturk, O., & Papafragou, A. (2008). Acquisition of evidentiality and source monitoring. In H. Chan, H. Jacob, & E. Kapia (Eds.), Proceedings from the 32nd Annual Boston University Conference on Language Development [BUCLD 32] (pp. 368-377). Somerville, Mass.: Cascadilla Press.
  • Ozyurek, A. (1998). An analysis of the basic meaning of Turkish demonstratives in face-to-face conversational interaction. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (Eds.), Oralite et gestualite: Communication multimodale, interaction: actes du colloque ORAGE 98 (pp. 609-614). Paris: L'Harmattan.
  • Ozyurek, A., & Ozcaliskan, S. (2000). How do children learn to conflate manner and path in their speech and gestures? Differences in English and Turkish. In E. V. Clark (Ed.), The proceedings of the Thirtieth Child Language Research Forum (pp. 77-85). Stanford: CSLI Publications.
  • Papafragou, A., & Ozturk, O. (2007). Children's acquisition of modality. In Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America (GALANA 2) (pp. 320-327). Somerville, Mass.: Cascadilla Press.
  • Papafragou, A. (2007). On the acquisition of modality. In T. Scheffler, & L. Mayol (Eds.), Penn Working Papers in Linguistics. Proceedings of the 30th Annual Penn Linguistics Colloquium (pp. 281-293). Department of Linguistics, University of Pennsylvania.
  • Petersson, K. M. (2008). On cognition, structured sequence processing, and adaptive dynamical systems. American Institute of Physics Conference Proceedings, 1060(1), 195-200.

    Abstract

    Cognitive neuroscience approaches the brain as a cognitive system: a system that functionally is conceptualized in terms of information processing. We outline some aspects of this concept and consider a physical system to be an information processing device when a subclass of its physical states can be viewed as representational/cognitive and transitions between these can be conceptualized as a process operating on these states by implementing operations on the corresponding representational structures. We identify a generic and fundamental problem in cognition: sequentially organized structured processing. Structured sequence processing provides the brain, in an essential sense, with its processing logic. In an approach addressing this problem, we illustrate how to integrate levels of analysis within a framework of adaptive dynamical systems. We note that the dynamical system framework lends itself to a description of asynchronous event-driven devices, which is likely to be important in cognition because the brain appears to be an asynchronous processing system. We use the human language faculty and natural language processing as a concrete example through out.
  • Rapold, C. J. (2007). From demonstratives to verb agreement in Benchnon: A diachronic perspective. In A. Amha, M. Mous, & G. Savà (Eds.), Omotic and Cushitic studies: Papers from the Fourth Cushitic Omotic Conference, Leiden, 10-12 April 2003 (pp. 69-88). Cologne: Rüdiger Köppe.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). The strength of stress-related lexical competition depends on the presence of first-syllable stress. In Proceedings of Interspeech 2008 (pp. 1954-1954).

    Abstract

    Dutch listeners' looks to printed words were tracked while they listened to instructions to click with their mouse on one of them. When presented with targets from word pairs where the first two syllables were segmentally identical but differed in stress location, listeners used stress information to recognize the target before segmental information disambiguated the words. Furthermore, the amount of lexical competition was influenced by the presence or absence of word-initial stress.
  • Reinisch, E., Jesse, A., & McQueen, J. M. (2008). Lexical stress information modulates the time-course of spoken-word recognition. In Proceedings of Acoustics' 08 (pp. 3183-3188).

    Abstract

    Segmental as well as suprasegmental information is used by Dutch listeners to recognize words. The time-course of the effect of suprasegmental stress information on spoken-word recognition was investigated in a previous study, in which we tracked Dutch listeners' looks to arrays of four printed words as they listened to spoken sentences. Each target was displayed along with a competitor that did not differ segmentally in its first two syllables but differed in stress placement (e.g., 'CENtimeter' and 'sentiMENT'). The listeners' eye-movements showed that stress information is used to recognize the target before distinct segmental information is available. Here, we examine the role of durational information in this effect. Two experiments showed that initial-syllable duration, as a cue to lexical stress, is not interpreted dependent on the speaking rate of the preceding carrier sentence. This still held when other stress cues like pitch and amplitude were removed. Rather, the speaking rate of the preceding carrier affected the speed of word recognition globally, even though the rate of the target itself was not altered. Stress information modulated lexical competition, but did so independently of the rate of the preceding carrier, even if duration was the only stress cue present.
  • Ringersma, J., & Kemps-Snijders, M. (2007). Creating multimedia dictionaries of endangered languages using LEXUS. In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 65-68). Baixas, France: ISCA-Int.Speech Communication Assoc.

    Abstract

    This paper reports on the development of a flexible web based lexicon tool, LEXUS. LEXUS is targeted at linguists involved in language documentation (of endangered languages). It allows the creation of lexica within the structure of the proposed ISO LMF standard and uses the proposed concept naming conventions from the ISO data categories, thus enabling interoperability, search and merging. LEXUS also offers the possibility to visualize language, since it provides functionalities to include audio, video and still images to the lexicon. With LEXUS it is possible to create semantic network knowledge bases, using typed relations. The LEXUS tool is free for use. Index Terms: lexicon, web based application, endangered languages, language documentation.
  • Robotham, L., Trinkler, I., & Sauter, D. (2008). The power of positives: Evidence for an overall emotional recognition deficit in Huntington's disease [Abstract]. Journal of Neurology, Neurosurgery & Psychiatry, 79, A12.

    Abstract

    The recognition of emotions of disgust, anger and fear have been shown to be significantly impaired in Huntington’s disease (eg,Sprengelmeyer et al, 1997, 2006; Gray et al, 1997; Milders et al, 2003,Montagne et al, 2006; Johnson et al, 2007; De Gelder et al, 2008). The relative impairment of these emotions might have implied a recognition impairment specific to negative emotions. Could the asymmetric recognition deficits be due not to the complexity of the emotion but rather reflect the complexity of the task? In the current study, 15 Huntington’s patients and 16 control subjects were presented with negative and positive non-speech emotional vocalisations that were to be identified as anger, fear, sadness, disgust, achievement, pleasure and amusement in a forced-choice paradigm. This experiment more accurately matched the negative emotions with positive emotions in a homogeneous modality. The resulting dually impaired ability of Huntington’s patients to identify negative and positive non-speech emotional vocalisations correctly provides evidence for an overall emotional recognition deficit in the disease. These results indicate that previous findings of a specificity in emotional recognition deficits might instead be due to the limitations of the visual modality. Previous experiments may have found an effect of emotional specificy due to the presence of a single positive emotion, happiness, in the midst of multiple negative emotions. In contrast with the previous literature, the study presented here points to a global deficit in the recognition of emotional sounds.
  • De Ruiter, J. P. (2007). Some multimodal signals in humans. In I. Van de Sluis, M. Theune, E. Reiter, & E. Krahmer (Eds.), Proceedings of the Workshop on Multimodal Output Generation (MOG 2007) (pp. 141-148).

    Abstract

    In this paper, I will give an overview of some well-studied multimodal signals that humans produce while they communicate with other humans, and discuss the implications of those studies for HCI. I will first discuss a conceptual framework that allows us to distinguish between functional and sensory modalities. This distinction is important, as there are multiple functional modalities using the same sensory modality (e.g., facial expression and eye-gaze in the visual modality). A second theoretically important issue is redundancy. Some signals appear to be redundant with a signal in another modality, whereas others give new information or even appear to give conflicting information (see e.g., the work of Susan Goldin-Meadows on speech accompanying gestures). I will argue that multimodal signals are never truly redundant. First, many gestures that appear at first sight to express the same meaning as the accompanying speech generally provide extra (analog) information about manner, path, etc. Second, the simple fact that the same information is expressed in more than one modality is itself a communicative signal. Armed with this conceptual background, I will then proceed to give an overview of some multimodalsignals that have been investigated in human-human research, and the level of understanding we have of the meaning of those signals. The latter issue is especially important for potential implementations of these signals in artificial agents. First, I will discuss pointing gestures. I will address the issue of the timing of pointing gestures relative to the speech it is supposed to support, the mutual dependency between pointing gestures and speech, and discuss the existence of alternative ways of pointing from other cultures. The most frequent form of pointing that does not involve the index finger is a cultural practice called lip-pointing which employs two visual functional modalities, mouth-shape and eye-gaze, simultaneously for pointing. Next, I will address the issue of eye-gaze. A classical study by Kendon (1967) claims that there is a systematic relationship between eye-gaze (at the interlocutor) and turn-taking states. Research at our institute has shown that this relationship is weaker than has often been assumed. If the dialogue setting contains a visible object that is relevant to the dialogue (e.g., a map), the rate of eye-gaze-at-other drops dramatically and its relationship to turn taking disappears completely. The implications for machine generated eye-gaze are discussed. Finally, I will explore a theoretical debate regarding spontaneous gestures. It has often been claimed that the class of gestures that is called iconic by McNeill (1992) are a “window into the mind”. That is, they are claimed to give the researcher (or even the interlocutor) a direct view into the speaker’s thought, without being obscured by the complex transformation that take place when transforming a thought into a verbal utterance. I will argue that this is an illusion. Gestures can be shown to be specifically designed such that the listener can be expected to interpret them. Although the transformations carried out to express a thought in gesture are indeed (partly) different from the corresponding transformations for speech, they are a) complex, and b) severely understudied. This obviously has consequences both for the gesture research agenda, and for the generation of iconic gestures by machines.
  • De Ruiter, J. P. (2004). On the primacy of language in multimodal communication. In Workshop Proceedings on Multimodal Corpora: Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces.(LREC2004) (pp. 38-41). Paris: ELRA - European Language Resources Association (CD-ROM).

    Abstract

    In this paper, I will argue that although the study of multimodal interaction offers exciting new prospects for Human Computer Interaction and human-human communication research, language is the primary form of communication, even in multimodal systems. I will support this claim with theoretical and empirical arguments, mainly drawn from human-human communication research, and will discuss the implications for multimodal communication research and Human-Computer Interaction.
  • De Ruiter, L. E. (2008). How useful are polynomials for analyzing intonation? In Proceedings of Interspeech 2008 (pp. 785-789).

    Abstract

    This paper presents the first application of polynomial modeling as a means for validating phonological pitch accent labels to German data. It is compared to traditional phonetic analysis (measuring minima, maxima, alignment). The traditional method fares better in classification, but results are comparable in statistical accent pair testing. Robustness tests show that pitch correction is necessary in both cases. The approaches are discussed in terms of their practicability, applicability to other domains of research and interpretability of their results.
  • De Ruiter, J. P., & Enfield, N. J. (2007). The BIC model: A blueprint for the communicator. In C. Stephanidis (Ed.), Universal access in Human-Computer Interaction: Applications and services (pp. 251-258). Berlin: Springer.
  • Sauter, D., Scott, S., & Calder, A. (2004). Categorisation of vocally expressed positive emotion: A first step towards basic positive emotions? [Abstract]. Proceedings of the British Psychological Society, 12, 111.

    Abstract

    Most of the study of basic emotion expressions has focused on facial expressions and little work has been done to specifically investigate happiness, the only positive of the basic emotions (Ekman & Friesen, 1971). However, a theoretical suggestion has been made that happiness could be broken down into discrete positive emotions, which each fulfil the criteria of basic emotions, and that these would be expressed vocally (Ekman, 1992). To empirically test this hypothesis, 20 participants categorised 80 paralinguistic sounds using the labels achievement, amusement, contentment, pleasure and relief. The results suggest that achievement, amusement and relief are perceived as distinct categories, which subjects accurately identify. In contrast, the categories of contentment and pleasure were systematically confused with other responses, although performance was still well above chance levels. These findings are initial evidence that the positive emotions engage distinct vocal expressions and may be considered to be distinct emotion categories.
  • Sauter, D., Eisner, F., Rosen, S., & Scott, S. K. (2008). The role of source and filter cues in emotion recognition in speech [Abstract]. Journal of the Acoustical Society of America, 123, 3739-3740.

    Abstract

    In the context of the source-filter theory of speech, it is well established that intelligibility is heavily reliant on information carried by the filter, that is, spectral cues (e.g., Faulkner et al., 2001; Shannon et al., 1995). However, the extraction of other types of information in the speech signal, such as emotion and identity, is less well understood. In this study we investigated the extent to which emotion recognition in speech depends on filterdependent cues, using a forced-choice emotion identification task at ten levels of noise-vocoding ranging between one and 32 channels. In addition, participants performed a speech intelligibility task with the same stimuli. Our results indicate that compared to speech intelligibility, emotion recognition relies less on spectral information and more on cues typically signaled by source variations, such as voice pitch, voice quality, and intensity. We suggest that, while the reliance on spectral dynamics is likely a unique aspect of human speech, greater phylogenetic continuity across species may be found in the communication of affect in vocalizations.
  • Sauter, D. (2008). The time-course of emotional voice processing [Abstract]. Neurocase, 14, 455-455.

    Abstract

    Research using event-related brain potentials (ERPs) has demonstrated an early differential effect in fronto-central regions when processing emotional, as compared to affectively neutral facial stimuli (e.g., Eimer & Holmes, 2002). In this talk, data demonstrating a similar effect in the auditory domain will be presented. ERPs were recorded in a one-back task where participants had to identify immediate repetitions of emotion category, such as a fearful sound followed by another fearful sound. The stimulus set consisted of non-verbal emotional vocalisations communicating positive and negative sounds, as well as neutral baseline conditions. Similarly to the facial domain, fear sounds as compared to acoustically controlled neutral sounds, elicited a frontally distributed positivity with an onset latency of about 150 ms after stimulus onset. These data suggest the existence of a rapid multi-modal frontocentral mechanism discriminating emotional from non-emotional human signals.
  • Scharenborg, O., Ernestus, M., & Wan, V. (2007). Segmentation of speech: Child's play? In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 1953-1956). Adelaide: Causal Productions.

    Abstract

    The difficulty of the task of segmenting a speech signal into its words is immediately clear when listening to a foreign language; it is much harder to segment the signal into its words, since the words of the language are unknown. Infants are faced with the same task when learning their first language. This study provides a better understanding of the task that infants face while learning their native language. We employed an automatic algorithm on the task of speech segmentation without prior knowledge of the labels of the phonemes. An analysis of the boundaries erroneously placed inside a phoneme showed that the algorithm consistently placed additional boundaries in phonemes in which acoustic changes occur. These acoustic changes may be as great as the transition from the closure to the burst of a plosive or as subtle as the formant transitions in low or back vowels. Moreover, we found that glottal vibration may attenuate the relevance of acoustic changes within obstruents. An interesting question for further research is how infants learn to overcome the natural tendency to segment these ‘dynamic’ phonemes.
  • Scharenborg, O., & Wan, V. (2007). Can unquantised articulatory feature continuums be modelled? In INTERSPEECH 2007 - 8th Annual Conference of the International Speech Communication Association (pp. 2473-2476). ISCA Archive.

    Abstract

    Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Although termed ‘articulatory’, previous definitions make certain assumptions that are invalid, for instance, that articulators ‘hop’ from one fixed position to the next. In this paper, we studied two methods, based on support vector classification (SVC) and regression (SVR), in which the articulation continuum is modelled without being restricted to using discrete AF value classes. A comparison with a baseline system trained on quantised values of the articulation continuum showed that both SVC and SVR outperform the baseline for two of the three investigated AFs, with improvements up to 5.6% absolute.
  • Scharenborg, O., & Cooke, M. P. (2008). Comparing human and machine recognition performance on a VCV corpus. In ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery".

    Abstract

    Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and without ‘pitch’ information, using the same backend. The results are compared with listener results at the level of articulatory feature classification. While no acoustic feature representation reached the levels of human performance, both MFCCs and Rate maps achieved good scores, with Rate maps nearing human performance on the classification of voicing. Comparing the results on the most difficult articulatory features to classify showed similarities between the humans and the SVMs: e.g., ‘dental’ was by far the least well identified by both groups. Overall, adding pitch information seemed to hamper classification performance.
  • Scharenborg, O., Bouwman, G., & Boves, L. (2000). Connected digit recognition with class specific word models. In Proceedings of the COST249 Workshop on Voice Operated Telecom Services workshop (pp. 71-74).

    Abstract

    This work focuses on efficient use of the training material by selecting the optimal set of model topologies. We do this by training multiple word models of each word class, based on a subclassification according to a priori knowledge of the training material. We will examine classification criteria with respect to duration of the word, gender of the speaker, position of the word in the utterance, pauses in the vicinity of the word, and combinations of these. Comparative experiments were carried out on a corpus consisting of Dutch spoken connected digit strings and isolated digits, which are recorded in a wide variety of acoustic conditions. The results show, that classification based on gender of the speaker, position of the digit in the string, pauses in the vicinity of the training tokens, and models based on a combination of these criteria perform significantly better than the set with single models per digit.
  • Scharenborg, O., Boves, L., & Ten Bosch, L. (2004). ‘On-line early recognition’ of polysyllabic words in continuous speech. In S. Cassidy, F. Cox, R. Mannell, & P. Sallyanne (Eds.), Proceedings of the Tenth Australian International Conference on Speech Science & Technology (pp. 387-392). Canberra: Australian Speech Science and Technology Association Inc.

    Abstract

    In this paper, we investigate the ability of SpeM, our recognition system based on the combination of an automatic phone recogniser and a wordsearch module, to determine as early as possible during the word recognition process whether a word is likely to be recognised correctly (this we refer to as ‘on-line’ early word recognition). We present two measures that can be used to predict whether a word is correctly recognised: the Bayesian word activation and the amount of available (acoustic) information for a word. SpeM was tested on 1,463 polysyllabic words in 885 continuous speech utterances. The investigated predictors indicated that a word activation that is 1) high (but not too high) and 2) based on more phones is more reliable to predict the correctness of a word than a similarly high value based on a small number of phones or a lower value of the word activation.
  • Scharenborg, O. (2008). Modelling fine-phonetic detail in a computational model of word recognition. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1473-1476). ISCA Archive.

    Abstract

    There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal helps listeners to segment a speech signal into syllables and words. In this paper, we compare two computational models of word recognition on their ability to capture and use this finephonetic detail during speech recognition. One model, SpeM, is phoneme-based, whereas the other, newly developed Fine- Tracker, is based on articulatory features. Simulations dealt with modelling the ability of listeners to distinguish short words (e.g., ‘ham’) from the longer words in which they are embedded (e.g., ‘hamster’). The simulations with Fine- Tracker showed that it was, like human listeners, able to distinguish between short words from the longer words in which they are embedded. This suggests that it is possible to extract this fine-phonetic detail from the speech signal and use it during word recognition.
  • Scheu, O., & Zinn, C. (2007). How did the e-learning session go? The student inspector. In Proceedings of the 13th International Conference on Artificial Intelligence and Education (AIED 2007). Amsterdam: IOS Press.

    Abstract

    Good teachers know their students, and exploit this knowledge to adapt or optimise their instruction. Traditional teachers know their students because they interact with them face-to-face in classroom or one-to-one tutoring sessions. In these settings, they can build student models, i.e., by exploiting the multi-faceted nature of human-human communication. In distance-learning contexts, teacher and student have to cope with the lack of such direct interaction, and this must have detrimental effects for both teacher and student. In a past study we have analysed teacher requirements for tracking student actions in computer-mediated settings. Given the results of this study, we have devised and implemented a tool that allows teachers to keep track of their learners'interaction in e-learning systems. We present the tool's functionality and user interfaces, and an evaluation of its usability.
  • Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., & Sloetjes, H. (2008). An exchange format for multimodal annotations. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation of multimodality. We propose a multimodal annotation exchange format, based on the annotation graph formalism, which is supported by import and export routines in the respective tools
  • Schulte im Walde, S., Melinger, A., Roth, M., & Weber, A. (2007). An empirical characterization of response types in German association norms. In Proceedings of the GLDV workshop on lexical-semantic and ontological resources.
  • Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.

    Abstract

    This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.
  • Scott, D. R., & Cutler, A. (1982). Segmental cues to syntactic structure. In Proceedings of the Institute of Acoustics 'Spectral Analysis and its Use in Underwater Acoustics' (pp. E3.1-E3.4). London: Institute of Acoustics.
  • Scott, S., & Sauter, D. (2004). Vocal expressions of emotion and positive and negative basic emotions [Abstract]. Proceedings of the British Psychological Society, 12, 156.

    Abstract

    Previous studies have indicated that vocal and facial expressions of the ‘basic’ emotions share aspects of processing. Thus amygdala damage compromises the perception of fear and anger from the face and from the voice. In the current study we tested the hypothesis that there exist positive basic emotions, expressed mainly in the voice (Ekman, 1992). Vocal stimuli were produced to express the specific positive emotions of amusement, achievement, pleasure, contentment and relief.
  • Senft, G. (1991). Bakavilisi Biga - we can 'turn' the language - or: What happens to English words in Kilivila language? In W. Bahner, J. Schildt, & D. Viehwegger (Eds.), Proceedings of the XIVth International Congress of Linguists (pp. 1743-1746). Berlin: Akademie Verlag.
  • Senft, G. (2000). COME and GO in Kilivila. In B. Palmer, & P. Geraghty (Eds.), SICOL. Proceedings of the second international conference on Oceanic linguistics: Volume 2, Historical and descriptive studies (pp. 105-136). Canberra: Pacific Linguistics.
  • Senft, G. (2007). Language, culture and cognition: Frames of spatial reference and why we need ontologies of space [Abstract]. In A. G. Cohn, C. Freksa, & B. Bebel (Eds.), Spatial cognition: Specialization and integration (pp. 12).

    Abstract

    One of the many results of the "Space" research project conducted at the MPI for Psycholinguistics is that there are three "Frames of spatial Reference" (FoRs), the relative, the intrinsic and the absolute FoR. Cross-linguistic research showed that speakers who prefer one FoR in verbal spatial references rely on a comparable coding system for memorizing spatial configurations and for making inferences with respect to these spatial configurations in non-verbal problem solving. Moreover, research results also revealed that in some languages these verbal FoRs also influence gestural behavior. These results document the close interrelationship between language, culture and cognition in the domain "Space". The proper description of these interrelationships in the spatial domain requires language and culture specific ontologies.
  • Seuren, P. A. M. (1991). Notes on noun phrases and quantification. In Proceedings of the International Conference on Current Issues in Computational Linguistics (pp. 19-44). Penang, Malaysia: Universiti Sains Malaysia.
  • Seuren, P. A. M. (1982). Riorientamenti metodologici nello studio della variabilità linguistica. In D. Gambarara, & A. D'Atri (Eds.), Ideologia, filosofia e linguistica: Atti del Convegno Internazionale di Studi, Rende (CS) 15-17 Settembre 1978 ( (pp. 499-515). Roma: Bulzoni.
  • Seuren, P. A. M. (1996). What a universal semantic interlingua can do. In A. Zamulin (Ed.), Perspectives of System Informatics. Proceedings of the Andrei Ershov Second International Memorial Conference, Novosibirsk, Akademgorodok, June 25-28,1996 (pp. 41-42). Novosibirsk: A.P. Ershov Institute of Informatics Systems.
  • Seuren, P. A. M. (1991). What makes a text untranslatable? In H. M. N. Noor Ein, & H. S. Atiah (Eds.), Pragmatik Penterjemahan: Prinsip, Amalan dan Penilaian Menuju ke Abad 21 ("The Pragmatics of Translation: Principles, Practice and Evaluation Moving towards the 21st Century") (pp. 19-27). Kuala Lumpur: Dewan Bahasa dan Pustaka.
  • Shatzman, K. B. (2004). Segmenting ambiguous phrases using phoneme duration. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 329-332). Seoul: Sunjijn Printing Co.

    Abstract

    The results of an eye-tracking experiment are presented in which Dutch listeners' eye movements were monitored as they heard sentences and saw four pictured objects. Participants were instructed to click on the object mentioned in the sentence. In the critical sentences, a stop-initial target (e.g., "pot") was preceded by an [s], thus causing ambiguity regarding whether the sentence refers to a stop-initial or a cluster-initial word (e.g., "spot"). Participants made fewer fixations to the target pictures when the stop and the preceding [s] were cross-spliced from the cluster-initial word than when they were spliced from a different token of the sentence containing the stop-initial word. Acoustic analyses showed that the two versions differed in various measures, but only one of these - the duration of the [s] - correlated with the perceptual effect. Thus, in this context, the [s] duration information is an important factor guiding word recognition.
  • Sloetjes, H., & Wittenburg, P. (2008). Annotation by category - ELAN and ISO DCR. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    The Data Category Registry is one of the ISO initiatives towards the establishment of standards for Language Resource management, creation and coding. Successful application of the DCR depends on the availability of tools that can interact with it. This paper describes the first steps that have been taken to provide users of the multimedia annotation tool ELAN, with the means to create references from tiers and annotations to data categories defined in the ISO Data Category Registry. It first gives a brief description of the capabilities of ELAN and the structure of the documents it creates. After a concise overview of the goals and current state of the ISO DCR infrastructure, a description is given of how the preliminary connectivity with the DCR is implemented in ELAN

Share this page