Publications

Displaying 101 - 140 of 140
  • Ozyurek, A. (1994). How children talk about conversations: Development of roles and voices. In E. V. Clark (Ed.), Proceedings of the Twenty-Sixth Annual Child Language Research Forum (pp. 197-206). Stanford: CSLI Publications.
  • Ozyurek, A., & Kita, S. (1999). Expressing manner and path in English and Turkish: Differences in speech, gesture, and conceptualization. In M. Hahn, & S. C. Stoness (Eds.), Proceedings of the Twenty-first Annual Conference of the Cognitive Science Society (pp. 507-512). London: Erlbaum.
  • Ozyurek, A. (2001). What do speech-gesture mismatches reveal about language specific processing? A comparison of Turkish and English. In C. Cavé, I. Guaitella, & S. Santi (Eds.), Oralité et gestualité: Interactions et comportements multimodaux dans la communication: Actes du Colloque ORAGE 2001 (pp. 567-581). Paris: L'Harmattan.
  • Papafragou, A., & Ozturk, O. (2006). The acquisition of epistemic modality. In A. Botinis (Ed.), Proceedings of ITRW on Experimental Linguistics in ExLing-2006 (pp. 201-204). ISCA Archive.

    Abstract

    In this paper we try to contribute to the body of knowledge about the acquisition of English epistemic modal verbs (e.g. Mary may/has to be at school). Semantically, these verbs encode possibility or necessity with respect to available evidence. Pragmatically, the use of epistemic modals often gives rise to scalar conversational inferences (Mary may be at school -> Mary doesn’t have to be at school). The acquisition of epistemic modals is challenging for children on both these levels. In this paper, we present findings from two studies which were conducted with 5-year-old children and adults. Our findings, unlike previous work, show that 5-yr-olds have mastered epistemic modal semantics, including the notions of necessity and possibility. However, they are still in the process of acquiring epistemic modal pragmatics.
  • Pereiro Estevan, Y., Wan, V., Scharenborg, O., & Gallardo Antolín, A. (2006). Segmentación de fonemas no supervisada basada en métodos kernel de máximo margen. In Proceedings of IV Jornadas en Tecnología del Habla.

    Abstract

    En este artículo se desarrolla un método automático de segmentación de fonemas no supervisado. Este método utiliza el algoritmo de agrupación de máximo margen [1] para realizar segmentación de fonemas sobre habla continua sin necesidad de información a priori para el entrenamiento del sistema.
  • Petersson, K. M., Grenholm, P., & Forkstam, C. (2005). Artificial grammar learning and neural networks. In G. B. Bruna, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th Annual Conference of the Cognitive Science Society (pp. 1726-1731).

    Abstract

    Recent FMRI studies indicate that language related brain regions are engaged in artificial grammar (AG) processing. In the present study we investigate the Reber grammar by means of formal analysis and network simulations. We outline a new method for describing the network dynamics and propose an approach to grammar extraction based on the state-space dynamics of the network. We conclude that statistical frequency-based and rule-based acquisition procedures can be viewed as complementary perspectives on grammar learning, and more generally, that classical cognitive models can be viewed as a special case of a dynamical systems perspective on information processing
  • Pluymaekers, M., Ernestus, M., Baayen, R. H., & Booij, G. (2006). The role of morphology in fine phonetic detail: The case of Dutch -igheid. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 53-54).
  • Pluymaekers, M., Ernestus, M., & Baayen, R. H. (2006). Effects of word frequency on the acoustic durations of affixes. In Proceedings of Interspeech 2006 (pp. 953-956). Pittsburgh: ICSLP.

    Abstract

    This study investigates whether the acoustic durations of derivational affixes in Dutch are affected by the frequency of the word they occur in. In a word naming experiment, subjects were presented with a large number of words containing one of the affixes ge-, ver-, ont, or -lijk. Their responses were recorded on DAT tapes, and the durations of the affixes were measured using Automatic Speech Recognition technology. To investigate whether frequency also affected durations when speech rate was high, the presentation rate of the stimuli was varied. The results show that a higher frequency of the word as a whole led to shorter acoustic realizations for all affixes. Furthermore, affixes became shorter as the presentation rate of the stimuli increased. There was no interaction between word frequency and presentation rate, suggesting that the frequency effect also applies in situations in which the speed of articulation is very high.
  • Poletiek, F. H., & Chater, N. (2006). Grammar induction profits from representative stimulus sampling. In R. Sun (Ed.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006) (pp. 1968-1973). Austin, TX, USA: Cognitive Science Society.
  • Poletiek, F. H., & Rassin E. (Eds.). (2005). Het (on)bewuste [Special Issue]. De Psycholoog.
  • Sauter, D., Wiland, J., Warren, J., Eisner, F., Calder, A., & Scott, S. K. (2005). Sounds of joy: An investigation of vocal expressions of positive emotions [Abstract]. Journal of Cognitive Neuroscience, 61(Supplement), B99.

    Abstract

    A series of experiment tested Ekman’s (1992) hypothesis that there are a set of positive basic emotions that are expressed using vocal para-linguistic sounds, e.g. laughter and cheers. The proposed categories investigated were amusement, contentment, pleasure, relief and triumph. Behavioural testing using a forced-choice task indicated that participants were able to reliably recognize vocal expressions of the proposed emotions. A cross-cultural study in the preliterate Himba culture in Namibia confirmed that these categories are also recognized across cultures. A recognition test of acoustically manipulated emotional vocalizations established that the recognition of different emotions utilizes different vocal cues, and that these in turn differ from the cues used when comprehending speech. In a study using fMRI we found that relative to a signal correlated noise baseline, the paralinguistic expressions of emotion activated bilateral superior temporal gyri and sulci, lateral and anterior to primary auditory cortex, which is consistent with the processing of non linguistic vocal cues in the auditory ‘what’ pathway. Notably amusement was associated with greater activation extending into both temporal poles and amygdale and insular cortex. Overall, these results support the claim that ‘happiness’ can be fractionated into amusement, pleasure, relief and triumph.
  • Scharenborg, O., Sturm, J., & Boves, L. (2001). Business listings in automatic directory assistance. In Interspeech - Eurospeech 2001 - 7th European Conference on Speech Communication and Technology (pp. 2381-2384). ISCA Archive.

    Abstract

    So far most attempts to automate Directory Assistance services focused on private listings, because it is not known precisely how callers will refer to a business listings. The research described in this paper, carried out in the SMADA project, tries to fill this gap. The aim of the research is to model the expressions people use when referring to a business listing by means of rules, in order to automatically create a vocabulary, which can be part of an automated DA service. In this paper a rule-base procedure is proposed, which derives rules from the expressions people use. These rules are then used to automatically create expressions from directory listings. Two categories of businesses, viz. hospitals and the hotel and catering industry, are used to explain this procedure. Results for these two categories are used to discuss the problem of the over- and undergeneration of expressions.
  • Scharenborg, O., Wan, V., & Moore, R. K. (2006). Capturing fine-phonetic variation in speech through automatic classification of articulatory features. In Speech Recognition and Intrinsic Variation Workshop [SRIV2006] (pp. 77-82). ISCA Archive.

    Abstract

    The ultimate goal of our research is to develop a computational model of human speech recognition that is able to capture the effects of fine-grained acoustic variation on speech recognition behaviour. As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. In the experiments reported here, we compared support vector machines (SVMs) with multilayer perceptrons (MLPs). MLPs have been widely (and rather successfully) used for the task of multi-value articulatory feature classification, while (to the best of our knowledge) SVMs have not. This paper compares the performances of the two classifiers and analyses the results in order to better understand the articulatory representations. It was found that the MLPs outperformed the SVMs, but it is concluded that both classifiers exhibit similar behaviour in terms of patterns of errors.
  • Scharenborg, O., & Seneff, S. (2005). A two-pass strategy for handling OOVs in a large vocabulary recognition task. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, (pp. 1669-1672). ISCA Archive.

    Abstract

    This paper addresses the issue of large-vocabulary recognition in a specific word class. We propose a two-pass strategy in which only major cities are explicitly represented in the first stage lexicon. An unknown word model encoded as a phone loop is used to detect OOV city names (referred to as rare city names). After which SpeM, a tool that can extract words and word-initial cohorts from phone graphs on the basis of a large fallback lexicon, provides an N-best list of promising city names on the basis of the phone sequences generated in the first stage. This N-best list is then inserted into the second stage lexicon for a subsequent recognition pass. Experiments were conducted on a set of spontaneous telephone-quality utterances each containing one rare city name. We tested the size of the N-best list and three types of language models (LMs). The experiments showed that SpeM was able to include nearly 85% of the correct city names into an N-best list of 3000 city names when a unigram LM, which also boosted the unigram scores of a city name in a given state, was used.
  • Scharenborg, O. (2005). Parallels between HSR and ASR: How ASR can contribute to HSR. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1237-1240). ISCA Archive.

    Abstract

    In this paper, we illustrate the close parallels between the research fields of human speech recognition (HSR) and automatic speech recognition (ASR) using a computational model of human word recognition, SpeM, which was built using techniques from ASR. We show that ASR has proven to be useful for improving models of HSR by relieving them of some of their shortcomings. However, in order to build an integrated computational model of all aspects of HSR, a lot of issues remain to be resolved. In this process, ASR algorithms and techniques definitely can play an important role.
  • Scott, S., & Sauter, D. (2006). Non-verbal expressions of emotion - acoustics, valence, and cross cultural factors. In Third International Conference on Speech Prosody 2006. ISCA.

    Abstract

    This presentation will address aspects of the expression of emotion in non-verbal vocal behaviour, specifically attempting to determine the roles of both positive and negative emotions, their acoustic bases, and the extent to which these are recognized in non-Western cultures.
  • Senft, G. (1991). Bakavilisi Biga - we can 'turn' the language - or: What happens to English words in Kilivila language? In W. Bahner, J. Schildt, & D. Viehwegger (Eds.), Proceedings of the XIVth International Congress of Linguists (pp. 1743-1746). Berlin: Akademie Verlag.
  • Seuren, P. A. M. (2001). Lexical meaning and metaphor. In E. N. Enikö (Ed.), Cognition in language use (pp. 422-431). Antwerp, Belgium: International Pragmatics Association (IPrA).
  • Seuren, P. A. M. (1991). Notes on noun phrases and quantification. In Proceedings of the International Conference on Current Issues in Computational Linguistics (pp. 19-44). Penang, Malaysia: Universiti Sains Malaysia.
  • Seuren, P. A. M. (1971). Qualche osservazione sulla frase durativa e iterativa in italiano. In M. Medici, & R. Simone (Eds.), Grammatica trasformazionale italiana (pp. 209-224). Roma: Bulzoni.
  • Seuren, P. A. M. (1994). The computational lexicon: All lexical content is predicate. In Z. Yusoff (Ed.), Proceedings of the International Conference on Linguistic Applications 26-28 July 1994 (pp. 211-216). Penang: Universiti Sains Malaysia, Unit Terjemahan Melalui Komputer (UTMK).
  • Seuren, P. A. M. (1991). What makes a text untranslatable? In H. M. N. Noor Ein, & H. S. Atiah (Eds.), Pragmatik Penterjemahan: Prinsip, Amalan dan Penilaian Menuju ke Abad 21 ("The Pragmatics of Translation: Principles, Practice and Evaluation Moving towards the 21st Century") (pp. 19-27). Kuala Lumpur: Dewan Bahasa dan Pustaka.
  • Seuren, P. A. M. (1994). Translation relations in semantic syntax. In G. Bouma, & G. Van Noord (Eds.), CLIN IV: Papers from the Fourth CLIN Meeting (pp. 149-162). Groningen: Vakgroep Alfa-informatica, Rijksuniversiteit Groningen.
  • Seyfeddinipur, M., & Kita, S. (2001). Gestures and dysfluencies in speech. In C. Cavé, I. Guaïtella, & S. Santi (Eds.), Oralité et gestualité: Interactions et comportements multimodaux dans la communication. Actes du colloque ORAGE 2001 (pp. 266-270). Paris, France: Éditions L'Harmattan.
  • Shattuck-Hufnagel, S., & Cutler, A. (1999). The prosody of speech error corrections revisited. In J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. Bailey (Eds.), Proceedings of the Fourteenth International Congress of Phonetic Sciences: Vol. 2 (pp. 1483-1486). Berkely: University of California.

    Abstract

    A corpus of digitized speech errors is used to compare the prosody of correction patterns for word-level vs. sound-level errors. Results for both peak F0 and perceived prosodic markedness confirm that speakers are more likely to mark corrections of word-level errors than corrections of sound-level errors, and that errors ambiguous between word-level and soundlevel (such as boat for moat) show correction patterns like those for sound level errors. This finding increases the plausibility of the claim that word-sound-ambiguous errors arise at the same level of processing as sound errors that do not form words.
  • Sidnell, J., & Stivers, T. (Eds.). (2005). Multimodal Interaction [Special Issue]. Semiotica, 156.
  • Sprenger, S. A., & Van Rijn, H. (2005). Clock time naming: Complexities of a simple task. In B. G. Bara, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th Annual Meeting of the Cognitive Science Society (pp. 2062-2067).
  • Ten Bosch, L., Baayen, R. H., & Ernestus, M. (2006). On speech variation and word type differentiation by articulatory feature representations. In Proceedings of Interspeech 2006 (pp. 2230-2233).

    Abstract

    This paper describes ongoing research aiming at the description of variation in speech as represented by asynchronous articulatory features. We will first illustrate how distances in the articulatory feature space can be used for event detection along speech trajectories in this space. The temporal structure imposed by the cosine distance in articulatory feature space coincides to a large extent with the manual segmentation on phone level. The analysis also indicates that the articulatory feature representation provides better such alignments than the MFCC representation does. Secondly, we will present first results that indicate that articulatory features can be used to probe for acoustic differences in the onsets of Dutch singulars and plurals.
  • ten Bosch, L., Hämäläinen, A., Scharenborg, O., & Boves, L. (2006). Acoustic scores and symbolic mismatch penalties in phone lattices. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing [ICASSP 2006]. IEEE.

    Abstract

    This paper builds on previous work that aims at unraveling the structure of the speech signal by means of using probabilistic representations. The context of this work is a multi-pass speech recognition system in which a phone lattice is created and used as a basis for a lexical search in which symbolic mismatches are allowed at certain costs. The focus is on the optimization of the costs of phone insertions, deletions and substitutions that are used in the lexical decoding pass. Two optimization approaches are presented, one related to a multi-pass computational model for human speech recognition, the other based on a decoding in which Bayes’ risks are minimized. In the final section, the advantages of these optimization methods are discussed and compared.
  • ten Bosch, L., & Scharenborg, O. (2005). ASR decoding in a computational model of human word recognition. In Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology (pp. 1241-1244). ISCA Archive.

    Abstract

    This paper investigates the interaction between acoustic scores and symbolic mismatch penalties in multi-pass speech decoding techniques that are based on the creation of a segment graph followed by a lexical search. The interaction between acoustic and symbolic mismatches determines to a large extent the structure of the search space of these multipass approaches. The background of this study is a recently developed computational model of human word recognition, called SpeM. SpeM is able to simulate human word recognition data and is built as a multi-pass speech decoder. Here, we focus on unravelling the structure of the search space that is used in SpeM and similar decoding strategies. Finally, we elaborate on the close relation between distances in this search space, and distance measures in search spaces that are based on a combination of acoustic and phonetic features.
  • Tuinman, A. (2006). Overcompensation of /t/ reduction in Dutch by German/Dutch bilinguals. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 101-102).
  • Van Ooijen, B., Cutler, A., & Norris, D. (1991). Detection times for vowels versus consonants. In Eurospeech 91: Vol. 3 (pp. 1451-1454). Genova: Istituto Internazionale delle Comunicazioni.

    Abstract

    This paper reports two experiments with vowels and consonants as phoneme detection targets in real words. In the first experiment, two relatively distinct vowels were compared with two confusible stop consonants. Response times to the vowels were longer than to the consonants. Response times correlated negatively with target phoneme length. In the second, two relatively distinct vowels were compared with their corresponding semivowels. This time, the vowels were detected faster than the semivowels. We conclude that response time differences between vowels and stop consonants in this task may reflect differences between phoneme categories in the variability of tokens, both in the acoustic realisation of targets and in the' representation of targets by subjects.
  • Van Geenhoven, V. (1999). A before-&-after picture of when-, before-, and after-clauses. In T. Matthews, & D. Strolovitch (Eds.), Proceedings of the 9th Semantics and Linguistic Theory Conference (pp. 283-315). Ithaca, NY, USA: Cornell University.
  • Van den Bos, E. J., & Poletiek, F. H. (2006). Implicit artificial grammar learning in adults and children. In R. Sun (Ed.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006) (pp. 2619). Austin, TX, USA: Cognitive Science Society.
  • Vosse, T., & Kempen, G. (1991). A hybrid model of human sentence processing: Parsing right-branching, center-embedded and cross-serial dependencies. In M. Tomita (Ed.), Proceedings of the Second International Workshop on Parsing Technologies.
  • Walsh Dickey, L. (1999). Syllable count and Tzeltal segmental allomorphy. In J. Rennison, & K. Kühnhammer (Eds.), Phonologica 1996. Proceedings of the 8th International Phonology Meeting (pp. 323-334). Holland Academic Graphics.

    Abstract

    Tzeltal, a Mayan language spoken in southern Mexico, exhibits allo-morphy of an unusual type. The vowel quality of the perfective suffix is determined by the number of syllables in the stem to which it is attaching. This paper presents previously unpublished data of this allomorphy and demonstrates that a syllable-count analysis of the phenomenon is the proper one. This finding is put in a more general context of segment-prosody interaction in allomorphy.
  • Warner, N., Jongman, A., Mucke, D., & Cutler, A. (2001). The phonological status of schwa insertion in Dutch: An EMA study. In B. Maassen, W. Hulstijn, R. Kent, H. Peters, & P. v. Lieshout (Eds.), Speech motor control in normal and disordered speech: 4th International Speech Motor Conference (pp. 86-89). Nijmegen: Vantilt.

    Abstract

    Articulatory data are used to address the question of whether Dutch schwa insertion is a phonological or a phonetic process. By investigating tongue tip raising and dorsal lowering, we show that /l/ when it appears before inserted schwa is a light /l/, just as /l/ before an underlying schwa is, and unlike the dark /l/ before a consonant in non-insertion productions of the same words. The fact that inserted schwa can condition the light/dark /l/ alternation shows that schwa insertion involves the phonological insertion of a segment rather than phonetic adjustments to articulations.
  • Widlok, T. (2006). Two ways of looking at a Mangetti grove. In A. Takada (Ed.), Proceedings of the workshop: Landscape and society (pp. 11-16). Kyoto: 21st Century Center of Excellence Program.
  • Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: a professional framework for multimodality research. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1556-1559).

    Abstract

    Utilization of computer tools in linguistic research has gained importance with the maturation of media frameworks for the handling of digital audio and video. The increased use of these tools in gesture, sign language and multimodal interaction studies has led to stronger requirements on the flexibility, the efficiency and in particular the time accuracy of annotation tools. This paper describes the efforts made to make ELAN a tool that meets these requirements, with special attention to the developments in the area of time accuracy. In subsequent sections an overview will be given of other enhancements in the latest versions of ELAN, that make it a useful tool in multimodality research.
  • Wittenburg, P., Broeder, D., Klein, W., Levinson, S. C., & Romary, L. (2006). Foundations of modern language resource archives. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 625-628).

    Abstract

    A number of serious reasons will convince an increasing amount of researchers to store their relevant material in centers which we will call "language resource archives". They combine the duty of taking care of long-term preservation as well as the task to give access to their material to different user groups. Access here is meant in the sense that an active interaction with the data will be made possible to support the integration of new data, new versions or commentaries of all sort. Modern Language Resource Archives will have to adhere to a number of basic principles to fulfill all requirements and they will have to be involved in federations to create joint language resource domains making it even more simple for the researchers to access the data. This paper makes an attempt to formulate the essential pillars language resource archives have to adhere to.

Share this page