Publications

Displaying 201 - 265 of 265
  • De Ruiter, L. E. (2008). How useful are polynomials for analyzing intonation? In Proceedings of Interspeech 2008 (pp. 785-789).

    Abstract

    This paper presents the first application of polynomial modeling as a means for validating phonological pitch accent labels to German data. It is compared to traditional phonetic analysis (measuring minima, maxima, alignment). The traditional method fares better in classification, but results are comparable in statistical accent pair testing. Robustness tests show that pitch correction is necessary in both cases. The approaches are discussed in terms of their practicability, applicability to other domains of research and interpretability of their results.
  • Sauter, D., Scott, S., & Calder, A. (2004). Categorisation of vocally expressed positive emotion: A first step towards basic positive emotions? [Abstract]. Proceedings of the British Psychological Society, 12, 111.

    Abstract

    Most of the study of basic emotion expressions has focused on facial expressions and little work has been done to specifically investigate happiness, the only positive of the basic emotions (Ekman & Friesen, 1971). However, a theoretical suggestion has been made that happiness could be broken down into discrete positive emotions, which each fulfil the criteria of basic emotions, and that these would be expressed vocally (Ekman, 1992). To empirically test this hypothesis, 20 participants categorised 80 paralinguistic sounds using the labels achievement, amusement, contentment, pleasure and relief. The results suggest that achievement, amusement and relief are perceived as distinct categories, which subjects accurately identify. In contrast, the categories of contentment and pleasure were systematically confused with other responses, although performance was still well above chance levels. These findings are initial evidence that the positive emotions engage distinct vocal expressions and may be considered to be distinct emotion categories.
  • Sauter, D., Eisner, F., Rosen, S., & Scott, S. K. (2008). The role of source and filter cues in emotion recognition in speech [Abstract]. Journal of the Acoustical Society of America, 123, 3739-3740.

    Abstract

    In the context of the source-filter theory of speech, it is well established that intelligibility is heavily reliant on information carried by the filter, that is, spectral cues (e.g., Faulkner et al., 2001; Shannon et al., 1995). However, the extraction of other types of information in the speech signal, such as emotion and identity, is less well understood. In this study we investigated the extent to which emotion recognition in speech depends on filterdependent cues, using a forced-choice emotion identification task at ten levels of noise-vocoding ranging between one and 32 channels. In addition, participants performed a speech intelligibility task with the same stimuli. Our results indicate that compared to speech intelligibility, emotion recognition relies less on spectral information and more on cues typically signaled by source variations, such as voice pitch, voice quality, and intensity. We suggest that, while the reliance on spectral dynamics is likely a unique aspect of human speech, greater phylogenetic continuity across species may be found in the communication of affect in vocalizations.
  • Sauter, D. (2008). The time-course of emotional voice processing [Abstract]. Neurocase, 14, 455-455.

    Abstract

    Research using event-related brain potentials (ERPs) has demonstrated an early differential effect in fronto-central regions when processing emotional, as compared to affectively neutral facial stimuli (e.g., Eimer & Holmes, 2002). In this talk, data demonstrating a similar effect in the auditory domain will be presented. ERPs were recorded in a one-back task where participants had to identify immediate repetitions of emotion category, such as a fearful sound followed by another fearful sound. The stimulus set consisted of non-verbal emotional vocalisations communicating positive and negative sounds, as well as neutral baseline conditions. Similarly to the facial domain, fear sounds as compared to acoustically controlled neutral sounds, elicited a frontally distributed positivity with an onset latency of about 150 ms after stimulus onset. These data suggest the existence of a rapid multi-modal frontocentral mechanism discriminating emotional from non-emotional human signals.
  • Scharenborg, O., Wan, V., & Moore, R. K. (2006). Capturing fine-phonetic variation in speech through automatic classification of articulatory features. In Speech Recognition and Intrinsic Variation Workshop [SRIV2006] (pp. 77-82). ISCA Archive.

    Abstract

    The ultimate goal of our research is to develop a computational model of human speech recognition that is able to capture the effects of fine-grained acoustic variation on speech recognition behaviour. As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. In the experiments reported here, we compared support vector machines (SVMs) with multilayer perceptrons (MLPs). MLPs have been widely (and rather successfully) used for the task of multi-value articulatory feature classification, while (to the best of our knowledge) SVMs have not. This paper compares the performances of the two classifiers and analyses the results in order to better understand the articulatory representations. It was found that the MLPs outperformed the SVMs, but it is concluded that both classifiers exhibit similar behaviour in terms of patterns of errors.
  • Scharenborg, O., & Cooke, M. P. (2008). Comparing human and machine recognition performance on a VCV corpus. In ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery".

    Abstract

    Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and without ‘pitch’ information, using the same backend. The results are compared with listener results at the level of articulatory feature classification. While no acoustic feature representation reached the levels of human performance, both MFCCs and Rate maps achieved good scores, with Rate maps nearing human performance on the classification of voicing. Comparing the results on the most difficult articulatory features to classify showed similarities between the humans and the SVMs: e.g., ‘dental’ was by far the least well identified by both groups. Overall, adding pitch information seemed to hamper classification performance.
  • Scharenborg, O., Boves, L., & Ten Bosch, L. (2004). ‘On-line early recognition’ of polysyllabic words in continuous speech. In S. Cassidy, F. Cox, R. Mannell, & P. Sallyanne (Eds.), Proceedings of the Tenth Australian International Conference on Speech Science & Technology (pp. 387-392). Canberra: Australian Speech Science and Technology Association Inc.

    Abstract

    In this paper, we investigate the ability of SpeM, our recognition system based on the combination of an automatic phone recogniser and a wordsearch module, to determine as early as possible during the word recognition process whether a word is likely to be recognised correctly (this we refer to as ‘on-line’ early word recognition). We present two measures that can be used to predict whether a word is correctly recognised: the Bayesian word activation and the amount of available (acoustic) information for a word. SpeM was tested on 1,463 polysyllabic words in 885 continuous speech utterances. The investigated predictors indicated that a word activation that is 1) high (but not too high) and 2) based on more phones is more reliable to predict the correctness of a word than a similarly high value based on a small number of phones or a lower value of the word activation.
  • Scharenborg, O., McQueen, J. M., Ten Bosch, L., & Norris, D. (2003). Modelling human speech recognition using automatic speech recognition paradigms in SpeM. In Proceedings of Eurospeech 2003 (pp. 2097-2100). Adelaide: Causal Productions.

    Abstract

    We have recently developed a new model of human speech recognition, based on automatic speech recognition techniques [1]. The present paper has two goals. First, we show that the new model performs well in the recognition of lexically ambiguous input. These demonstrations suggest that the model is able to operate in the same optimal way as human listeners. Second, we discuss how to relate the behaviour of a recogniser, designed to discover the optimum path through a word lattice, to data from human listening experiments. We argue that this requires a metric that combines both path-based and word-based measures of recognition performance. The combined metric varies continuously as the input speech signal unfolds over time.
  • Scharenborg, O. (2008). Modelling fine-phonetic detail in a computational model of word recognition. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1473-1476). ISCA Archive.

    Abstract

    There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal helps listeners to segment a speech signal into syllables and words. In this paper, we compare two computational models of word recognition on their ability to capture and use this finephonetic detail during speech recognition. One model, SpeM, is phoneme-based, whereas the other, newly developed Fine- Tracker, is based on articulatory features. Simulations dealt with modelling the ability of listeners to distinguish short words (e.g., ‘ham’) from the longer words in which they are embedded (e.g., ‘hamster’). The simulations with Fine- Tracker showed that it was, like human listeners, able to distinguish between short words from the longer words in which they are embedded. This suggests that it is possible to extract this fine-phonetic detail from the speech signal and use it during word recognition.
  • Scharenborg, O., ten Bosch, L., & Boves, L. (2003). Recognising 'real-life' speech with SpeM: A speech-based computational model of human speech recognition. In Eurospeech 2003 (pp. 2285-2288).

    Abstract

    In this paper, we present a novel computational model of human speech recognition – called SpeM – based on the theory underlying Shortlist. We will show that SpeM, in combination with an automatic phone recogniser (APR), is able to simulate the human speech recognition process from the acoustic signal to the ultimate recognition of words. This joint model takes an acoustic speech file as input and calculates the activation flows of candidate words on the basis of the degree of fit of the candidate words with the input. Experiments showed that SpeM outperforms Shortlist on the recognition of ‘real-life’ input. Furthermore, SpeM performs only slightly worse than an off-the-shelf full-blown automatic speech recogniser in which all words are equally probable, while it provides a transparent computationally elegant paradigm for modelling word activations in human word recognition.
  • Schiller, N. O. (2003). Metrical stress in speech production: A time course study. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 451-454). Adelaide: Causal Productions.

    Abstract

    This study investigated the encoding of metrical information during speech production in Dutch. In Experiment 1, participants were asked to judge whether bisyllabic picture names had initial or final stress. Results showed significantly faster decision times for initially stressed targets (e.g., LEpel 'spoon') than for targets with final stress (e.g., liBEL 'dragon fly'; capital letters indicate stressed syllables) and revealed that the monitoring latencies are not a function of the picture naming or object recognition latencies to the same pictures. Experiments 2 and 3 replicated the outcome of the first experiment with bi- and trisyllabic picture names. These results demonstrate that metrical information of words is encoded rightward incrementally during phonological encoding in speech production. The results of these experiments are in line with Levelt's model of phonological encoding.
  • Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., & Sloetjes, H. (2008). An exchange format for multimodal annotations. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper presents the results of a joint effort of a group of multimodality researchers and tool developers to improve the interoperability between several tools used for the annotation of multimodality. We propose a multimodal annotation exchange format, based on the annotation graph formalism, which is supported by import and export routines in the respective tools
  • Schoenmakers, G.-J., & De Swart, P. (2019). Adverbial hurdles in Dutch scrambling. In A. Gattnar, R. Hörnig, M. Störzer, & S. Featherston (Eds.), Proceedings of Linguistic Evidence 2018: Experimental Data Drives Linguistic Theory (pp. 124-145). Tübingen: University of Tübingen.

    Abstract

    This paper addresses the role of the adverb in Dutch direct object scrambling constructions. We report four experiments in which we investigate whether the structural position and the scope sensitivity of the adverb affect acceptability judgments of scrambling constructions and native speakers' tendency to scramble definite objects. We conclude that the type of adverb plays a key role in Dutch word ordering preferences.
  • Schuerman, W. L., McQueen, J. M., & Meyer, A. S. (2019). Speaker statistical averageness modulates word recognition in adverse listening conditions. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1203-1207). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    We tested whether statistical averageness (SA) at the level of the individual speaker could predict a speaker’s intelligibility. 28 female and 21 male speakers of Dutch were recorded producing 336 sentences,
    each containing two target nouns. Recordings were compared to those of all other same-sex speakers using dynamic time warping (DTW). For each sentence, the DTW distance constituted a metric
    of phonetic distance from one speaker to all other speakers. SA comprised the average of these distances. Later, the same participants performed a word recognition task on the target nouns in the same sentences, under three degraded listening conditions. In all three conditions, accuracy increased with SA. This held even when participants listened to their own utterances. These findings suggest that listeners process speech with respect to the statistical
    properties of the language spoken in their community, rather than using their own speech as a reference
  • Schuppler, B., Ernestus, M., Scharenborg, O., & Boves, L. (2008). Preparing a corpus of Dutch spontaneous dialogues for automatic phonetic analysis. In INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association (pp. 1638-1641). ISCA Archive.

    Abstract

    This paper presents the steps needed to make a corpus of Dutch spontaneous dialogues accessible for automatic phonetic research aimed at increasing our understanding of reduction phenomena and the role of fine phonetic detail. Since the corpus was not created with automatic processing in mind, it needed to be reshaped. The first part of this paper describes the actions needed for this reshaping in some detail. The second part reports the results of a preliminary analysis of the reduction phenomena in the corpus. For this purpose a phonemic transcription of the corpus was created by means of a forced alignment, first with a lexicon of canonical pronunciations and then with multiple pronunciation variants per word. In this study pronunciation variants were generated by applying a large set of phonetic processes that have been implicated in reduction to the canonical pronunciations of the words. This relatively straightforward procedure allows us to produce plausible pronunciation variants and to verify and extend the results of previous reduction studies reported in the literature.
  • Scott, S., & Sauter, D. (2006). Non-verbal expressions of emotion - acoustics, valence, and cross cultural factors. In Third International Conference on Speech Prosody 2006. ISCA.

    Abstract

    This presentation will address aspects of the expression of emotion in non-verbal vocal behaviour, specifically attempting to determine the roles of both positive and negative emotions, their acoustic bases, and the extent to which these are recognized in non-Western cultures.
  • Scott, S., & Sauter, D. (2004). Vocal expressions of emotion and positive and negative basic emotions [Abstract]. Proceedings of the British Psychological Society, 12, 156.

    Abstract

    Previous studies have indicated that vocal and facial expressions of the ‘basic’ emotions share aspects of processing. Thus amygdala damage compromises the perception of fear and anger from the face and from the voice. In the current study we tested the hypothesis that there exist positive basic emotions, expressed mainly in the voice (Ekman, 1992). Vocal stimuli were produced to express the specific positive emotions of amusement, achievement, pleasure, contentment and relief.
  • Seidl, A., & Johnson, E. K. (2003). Position and vowel quality effects in infant's segmentation of vowel-initial words. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2233-2236). Adelaide: Causal Productions.
  • Seidlmayer, E., Galke, L., Melnychuk, T., Schultz, C., Tochtermann, K., & Förstner, K. U. (2019). Take it personally - A Python library for data enrichment for infometrical applications. In M. Alam, R. Usbeck, T. Pellegrini, H. Sack, & Y. Sure-Vetter (Eds.), Proceedings of the Posters and Demo Track of the 15th International Conference on Semantic Systems co-located with 15th International Conference on Semantic Systems (SEMANTiCS 2019).

    Abstract

    Like every other social sphere, science is influenced by individual characteristics of researchers. However, for investigations on scientific networks, only little data about the social background of researchers, e.g. social origin, gender, affiliation etc., is available.
    This paper introduces ”Take it personally - TIP”, a conceptual model and library currently under development, which aims to support the
    semantic enrichment of publication databases with semantically related background information which resides elsewhere in the (semantic) web, such as Wikidata.
    The supplementary information enriches the original information in the publication databases and thus facilitates the creation of complex scientific knowledge graphs. Such enrichment helps to improve the scientometric analysis of scientific publications as they can also take social backgrounds of researchers into account and to understand social structure in research communities.
  • Seijdel, N., Sakmakidis, N., De Haan, E. H. F., Bohte, S. M., & Scholte, H. S. (2019). Implicit scene segmentation in deeper convolutional neural networks. In Proceedings of the 2019 Conference on Cognitive Computational Neuroscience (pp. 1059-1062). doi:10.32470/CCN.2019.1149-0.

    Abstract

    Feedforward deep convolutional neural networks (DCNNs) are matching and even surpassing human performance on object recognition. This performance suggests that activation of a loose collection of image
    features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Recent findings in humans however, suggest that while feedforward activity may suffice for
    sparse scenes with isolated objects, additional visual operations ('routines') that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to
    performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects
    and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image. Results indicated less distinction between object- and background features for more shallow networks. For those networks, we observed a benefit of training on segmented objects (as compared to unsegmented objects). Overall, deeper networks trained on natural
    (unsegmented) scenes seem to perform implicit 'segmentation' of the objects from their background, possibly by improved selection of relevant features.
  • Seuren, P. A. M. (1985). Predicate raising and semantic transparency in Mauritian Creole. In N. Boretzky, W. Enninger, & T. Stolz (Eds.), Akten des 2. Essener Kolloquiums über "Kreolsprachen und Sprachkontakte", 29-30 Nov. 1985 (pp. 203-229). Bochum: Brockmeyer.
  • Shatzman, K. B. (2004). Segmenting ambiguous phrases using phoneme duration. In S. Kin, & M. J. Bae (Eds.), Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004-ICSLP) (pp. 329-332). Seoul: Sunjijn Printing Co.

    Abstract

    The results of an eye-tracking experiment are presented in which Dutch listeners' eye movements were monitored as they heard sentences and saw four pictured objects. Participants were instructed to click on the object mentioned in the sentence. In the critical sentences, a stop-initial target (e.g., "pot") was preceded by an [s], thus causing ambiguity regarding whether the sentence refers to a stop-initial or a cluster-initial word (e.g., "spot"). Participants made fewer fixations to the target pictures when the stop and the preceding [s] were cross-spliced from the cluster-initial word than when they were spliced from a different token of the sentence containing the stop-initial word. Acoustic analyses showed that the two versions differed in various measures, but only one of these - the duration of the [s] - correlated with the perceptual effect. Thus, in this context, the [s] duration information is an important factor guiding word recognition.
  • Shen, C., & Janse, E. (2019). Articulatory control in speech production. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 2533-2537). Canberra, Australia: Australasian Speech Science and Technology Association Inc.
  • Shen, C., Cooke, M., & Janse, E. (2019). Individual articulatory control in speech enrichment. In M. Ochmann, M. Vorländer, & J. Fels (Eds.), Proceedings of the 23rd International Congress on Acoustics (pp. 5726-5730). Berlin: Deutsche Gesellschaft für Akustik.

    Abstract

    ndividual talkers may use various strategies to enrich their speech while speaking in noise (i.e., Lombard speech) to improve their intelligibility. The resulting acoustic-phonetic changes in Lombard speech vary amongst different speakers, but it is unclear what causes these talker differences, and what impact these differences have on intelligibility. This study investigates the potential role of articulatory control in talkers’ Lombard speech enrichment success. Seventy-eight speakers read out sentences in both their habitual style and in a condition where they were instructed to speak clearly while hearing loud speech-shaped noise. A diadochokinetic (DDK) speech task that requires speakers to repetitively produce word or non-word sequences as accurately and as rapidly as possible, was used to quantify their articulatory control. Individuals’ predicted intelligibility in both speaking styles (presented at -5 dB SNR) was measured using an acoustic glimpse-based metric: the High-Energy Glimpse Proportion (HEGP). Speakers’ HEGP scores show a clear effect of speaking condition (better HEGP scores in the Lombard than habitual condition), but no simple effect of articulatory control on HEGP, nor an interaction between speaking condition and articulatory control. This indicates that individuals’ speech enrichment success as measured by the HEGP metric was not predicted by DDK performance.
  • Shi, R., Werker, J., & Cutler, A. (2003). Function words in early speech perception. In Proceedings of the 15th International Congress of Phonetic Sciences (pp. 3009-3012).

    Abstract

    Three experiments examined whether infants recognise functors in phrases, and whether their representations of functors are phonetically well specified. Eight- and 13- month-old English infants heard monosyllabic lexical words preceded by real functors (e.g., the, his) versus nonsense functors (e.g., kuh); the latter were minimally modified segmentally (but not prosodically) from real functors. Lexical words were constant across conditions; thus recognition of functors would appear as longer listening time to sequences with real functors. Eightmonth- olds' listening times to sequences with real versus nonsense functors did not significantly differ, suggesting that they did not recognise real functors, or functor representations lacked phonetic specification. However, 13-month-olds listened significantly longer to sequences with real functors. Thus, somewhere between 8 and 13 months of age infants learn familiar functors and represent them with segmental detail. We propose that accumulated frequency of functors in input in general passes a critical threshold during this time.
  • Sloetjes, H., & Wittenburg, P. (2008). Annotation by category - ELAN and ISO DCR. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    The Data Category Registry is one of the ISO initiatives towards the establishment of standards for Language Resource management, creation and coding. Successful application of the DCR depends on the availability of tools that can interact with it. This paper describes the first steps that have been taken to provide users of the multimedia annotation tool ELAN, with the means to create references from tiers and annotations to data categories defined in the ISO Data Category Registry. It first gives a brief description of the capabilities of ELAN and the structure of the documents it creates. After a concise overview of the goals and current state of the ISO DCR infrastructure, a description is given of how the preliminary connectivity with the DCR is implemented in ELAN
  • De Sousa, H. (2008). The development of echo-subject markers in Southern Vanuatu. In T. J. Curnow (Ed.), Selected papers from the 2007 Conference of the Australian Linguistic Society. Australian Linguistic Society.

    Abstract

    One of the defining features of the Southern Vanuatu language family is the echo-subject (ES) marker (Lynch 2001: 177-178). Canonically, an ES marker indicates that the subject of the clause is coreferential with the subject of the preceding clause. This paper begins with a survey of the various ES systems found in Southern Vanuatu. Two prominent differences amongst the ES systems are: a) the level of obligatoriness of the ES marker; and b) the level of grammatical integration between an ES clauses and the preceding clause. The variation found amongst the ES systems reveals a clear path of grammaticalisation from the VP coordinator *ma in Proto–Southern Vanuatu to the various types of ES marker in contemporary Southern Vanuatu languages
  • Stehouwer, H., & Van den Bosch, A. (2008). Putting the t where it belongs: Solving a confusion problem in Dutch. In S. Verberne, H. Van Halteren, & P.-A. Coppen (Eds.), Computational Linguistics in the Netherlands 2007: Selected Papers from the 18th CLIN Meeting (pp. 21-36). Utrecht: LOT.

    Abstract

    A common Dutch writing error is to confuse a word ending in -d with a neighbor word ending in -dt. In this paper we describe the development of a machine-learning-based disambiguator that can determine which word ending is appropriate, on the basis of its local context. We develop alternative disambiguators, varying between a single monolithic classifier and having multiple confusable experts disambiguate between confusable pairs. Disambiguation accuracy of the best developed disambiguators exceeds 99%; when we apply these disambiguators to an external test set of collected errors, our detection strategy correctly identifies up to 79% of the errors.
  • Ten Bosch, L., Baayen, R. H., & Ernestus, M. (2006). On speech variation and word type differentiation by articulatory feature representations. In Proceedings of Interspeech 2006 (pp. 2230-2233).

    Abstract

    This paper describes ongoing research aiming at the description of variation in speech as represented by asynchronous articulatory features. We will first illustrate how distances in the articulatory feature space can be used for event detection along speech trajectories in this space. The temporal structure imposed by the cosine distance in articulatory feature space coincides to a large extent with the manual segmentation on phone level. The analysis also indicates that the articulatory feature representation provides better such alignments than the MFCC representation does. Secondly, we will present first results that indicate that articulatory features can be used to probe for acoustic differences in the onsets of Dutch singulars and plurals.
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Turn-taking in social talk dialogues: Temporal, formal and functional aspects. In 9th International Conference Speech and Computer (SPECOM'2004) (pp. 454-461).

    Abstract

    This paper presents a quantitative analysis of the
    turn-taking mechanism evidenced in 93 telephone
    dialogues that were taken from the 9-million-word
    Spoken Dutch Corpus. While the first part of the paper
    focuses on the temporal phenomena of turn taking, such
    as durations of pauses and overlaps of turns in the
    dialogues, the second part explores the discoursefunctional
    aspects of utterances in a subset of 8
    dialogues that were annotated especially for this
    purpose. The results show that speakers adapt their turntaking
    behaviour to the interlocutor’s behaviour.
    Furthermore, the results indicate that male-male dialogs
    show a higher proportion of overlapping turns than
    female-female dialogues.
  • Ten Bosch, L., Mulder, K., & Boves, L. (2019). Phase synchronization between EEG signals as a function of differences between stimuli characteristics. In Proceedings of Interspeech 2019 (pp. 1213-1217). doi:10.21437/Interspeech.2019-2443.

    Abstract

    The neural processing of speech leads to specific patterns in the brain which can be measured as, e.g., EEG signals. When properly aligned with the speech input and averaged over many tokens, the Event Related Potential (ERP) signal is able to differentiate specific contrasts between speech signals. Well-known effects relate to the difference between expected and unexpected words, in particular in the N400, while effects in N100 and P200 are related to attention and acoustic onset effects. Most EEG studies deal with the amplitude of EEG signals over time, sidestepping the effect of phase and phase synchronization. This paper investigates the relation between phase in the EEG signals measured in an auditory lexical decision task by Dutch participants listening to full and reduced English word forms. We show that phase synchronization takes place across stimulus conditions, and that the so-called circular variance is narrowly related to the type of contrast between stimuli.
  • ten Bosch, L., Hämäläinen, A., Scharenborg, O., & Boves, L. (2006). Acoustic scores and symbolic mismatch penalties in phone lattices. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing [ICASSP 2006]. IEEE.

    Abstract

    This paper builds on previous work that aims at unraveling the structure of the speech signal by means of using probabilistic representations. The context of this work is a multi-pass speech recognition system in which a phone lattice is created and used as a basis for a lexical search in which symbolic mismatches are allowed at certain costs. The focus is on the optimization of the costs of phone insertions, deletions and substitutions that are used in the lexical decoding pass. Two optimization approaches are presented, one related to a multi-pass computational model for human speech recognition, the other based on a decoding in which Bayes’ risks are minimized. In the final section, the advantages of these optimization methods are discussed and compared.
  • Ten Bosch, L., Oostdijk, N., & De Ruiter, J. P. (2004). Durational aspects of turn-taking in spontaneous face-to-face and telephone dialogues. In P. Sojka, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue: Proceedings of the 7th International Conference TSD 2004 (pp. 563-570). Heidelberg: Springer.

    Abstract

    On the basis of two-speaker spontaneous conversations, it is shown that the distributions of both pauses and speech-overlaps of telephone and faceto-face dialogues have different statistical properties. Pauses in a face-to-face
    dialogue last up to 4 times longer than pauses in telephone conversations in functionally comparable conditions. There is a high correlation (0.88 or larger) between the average pause duration for the two speakers across face-to-face
    dialogues and telephone dialogues. The data provided form a first quantitative analysis of the complex turn-taking mechanism evidenced in the dialogues available in the 9-million-word Spoken Dutch Corpus.
  • Ter Bekke, M., Ozyurek, A., & Ünal, E. (2019). Speaking but not gesturing predicts motion event memory within and across languages. In A. Goel, C. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2940-2946). Montreal, QB: Cognitive Science Society.

    Abstract

    In everyday life, people see, describe and remember motion events. We tested whether the type of motion event information (path or manner) encoded in speech and gesture predicts which information is remembered and if this varies across speakers of typologically different languages. We focus on intransitive motion events (e.g., a woman running to a tree) that are described differently in speech and co-speech gesture across languages, based on how these languages typologically encode manner and path information (Kita & Özyürek, 2003; Talmy, 1985). Speakers of Dutch (n = 19) and Turkish (n = 22) watched and described motion events. With a surprise (i.e. unexpected) recognition memory task, memory for manner and path components of these events was measured. Neither Dutch nor Turkish speakers’ memory for manner went above chance levels. However, we found a positive relation between path speech and path change detection: participants who described the path during encoding were more accurate at detecting changes to the path of an event during the memory task. In addition, the relation between path speech and path memory changed with native language: for Dutch speakers encoding path in speech was related to improved path memory, but for Turkish speakers no such relation existed. For both languages, co-speech gesture did not predict memory speakers. We discuss the implications of these findings for our understanding of the relations between speech, gesture, type of encoding in language and memory.
  • Trilsbeek, P., Broeder, D., Van Valkenhoef, T., & Wittenburg, P. (2008). A grid of regional language archives. In C. Calzolari (Ed.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008) (pp. 1474-1477). European Language Resources Association (ELRA).

    Abstract

    About two years ago, the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, started an initiative to install regional language archives in various places around the world, particularly in places where a large number of endangered languages exist and are being documented. These digital archives make use of the LAT archiving framework [1] that the MPI has developed
    over the past nine years. This framework consists of a number of web-based tools for depositing, organizing and utilizing linguistic resources in a digital archive. The regional archives are in principle autonomous archives, but they can decide to share metadata descriptions and language resources with the MPI archive in Nijmegen and become part of a grid of linked LAT archives. By doing so, they will also take advantage of the long-term preservation strategy of the MPI archive. This paper describes the reasoning
    behind this initiative and how in practice such an archive is set up.
  • Troncoso Ruiz, A., Ernestus, M., & Broersma, M. (2019). Learning to produce difficult L2 vowels: The effects of awareness-rasing, exposure and feedback. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 1094-1098). Canberra, Australia: Australasian Speech Science and Technology Association Inc.
  • Tuinman, A. (2006). Overcompensation of /t/ reduction in Dutch by German/Dutch bilinguals. In Variation, detail and representation: 10th Conference on Laboratory Phonology (pp. 101-102).
  • Van Dooren, A., Tulling, M., Cournane, A., & Hacquard, V. (2019). Discovering modal polysemy: Lexical aspect might help. In M. Brown, & B. Dailey (Eds.), BUCLD 43: Proceedings of the 43rd annual Boston University Conference on Language Development (pp. 203-216). Sommerville, MA: Cascadilla Press.
  • Van Valin Jr., R. D. (1987). Aspects of the interaction of syntax and pragmatics: Discourse coreference mechanisms and the typology of grammatical systems. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 513-531). Amsterdam: Benjamins.
  • Van Uytvanck, D., Dukers, A., Ringersma, J., & Trilsbeek, P. (2008). Language-sites: Accessing and presenting language resources via geographic information systems. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). Paris: European Language Resources Association (ELRA).

    Abstract

    The emerging area of Geographic Information Systems (GIS) has proven to add an interesting dimension to many research projects. Within the language-sites initiative we have brought together a broad range of links to digital language corpora and resources. Via Google Earth's visually appealing 3D-interface users can spin the globe, zoom into an area they are interested in and access directly the relevant language resources. This paper focuses on several ways of relating the map and the online data (lexica, annotations, multimedia recordings, etc.). Furthermore, we discuss some of the implementation choices that have been made, including future challenges. In addition, we show how scholars (both linguists and anthropologists) are using GIS tools to fulfill their specific research needs by making use of practical examples. This illustrates how both scientists and the general public can benefit from geography-based access to digital language data
  • Van den Bos, E. J., & Poletiek, F. H. (2006). Implicit artificial grammar learning in adults and children. In R. Sun (Ed.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (CogSci 2006) (pp. 2619). Austin, TX, USA: Cognitive Science Society.
  • Van Valin Jr., R. D. (1987). Pragmatics, island phenomena, and linguistic competence. In A. M. Farley, P. T. Farley, & K.-E. McCullough (Eds.), CLS 22. Papers from the parasession on pragmatics and grammatical theory (pp. 223-233). Chicago Linguistic Society.
  • Váradi, T., Wittenburg, P., Krauwer, S., Wynne, M., & Koskenniemi, K. (2008). CLARIN: Common language resources and technology infrastructure. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008).

    Abstract

    This paper gives an overview of the CLARIN project [1], which aims to create a research infrastructure that makes language resources and technology (LRT) available and readily usable to scholars of all disciplines, in particular the humanities and social sciences (HSS).
  • Vosse, T. G., & Kempen, G. (2008). Parsing verb-final clauses in German: Garden-path and ERP effects modeled by a parallel dynamic parser. In B. Love, K. McRae, & V. Sloutsky (Eds.), Proceedings of the 30th Annual Conference on the Cognitive Science Society (pp. 261-266). Washington: Cognitive Science Society.

    Abstract

    Experimental sentence comprehension studies have shown that superficially similar German clauses with verb-final word order elicit very different garden-path and ERP effects. We show that a computer implementation of the Unification Space parser (Vosse & Kempen, 2000) in the form of a localist-connectionist network can model the observed differences, at least qualitatively. The model embodies a parallel dynamic parser that, in contrast with existing models, does not distinguish between consecutive first-pass and reanalysis stages, and does not use semantic or thematic roles. It does use structural frequency data and animacy information.
  • Wagner, A., & Braun, A. (2003). Is voice quality language-dependent? Acoustic analyses based on speakers of three different languages. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 651-654). Adelaide: Causal Productions.
  • Wagner, M. A., Broersma, M., McQueen, J. M., & Lemhöfer, K. (2019). Imitating speech in an unfamiliar language and an unfamiliar non-native accent in the native language. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1362-1366). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    This study concerns individual differences in speech imitation ability and the role that lexical representations play in imitation. We examined 1) whether imitation of sounds in an unfamiliar language (L0) is related to imitation of sounds in an unfamiliar
    non-native accent in the speaker’s native language (L1) and 2) whether it is easier or harder to imitate speech when you know the words to be imitated. Fifty-nine native Dutch speakers imitated words with target vowels in Basque (/a/ and /e/) and Greekaccented
    Dutch (/i/ and /u/). Spectral and durational
    analyses of the target vowels revealed no relationship between the success of L0 and L1 imitation and no difference in performance between tasks (i.e., L1
    imitation was neither aided nor blocked by lexical knowledge about the correct pronunciation). The results suggest instead that the relationship of the vowels to native phonological categories plays a bigger role in imitation
  • Weber, A., & Smits, R. (2003). Consonant and vowel confusion patterns by American English listeners. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences.

    Abstract

    This study investigated the perception of American English phonemes by native listeners. Listeners identified either the consonant or the vowel in all possible English CV and VC syllables. The syllables were embedded in multispeaker babble at three signal-to-noise ratios (0 dB, 8 dB, and 16 dB). Effects of syllable position, signal-to-noise ratio, and articulatory features on vowel and consonant identification are discussed. The results constitute the largest source of data that is currently available on phoneme confusion patterns of American English phonemes by native listeners.
  • Weber, A., & Smits, R. (2003). Consonant and vowel confusion patterns by American English listeners. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 1437-1440). Adelaide: Causal Productions.

    Abstract

    This study investigated the perception of American English phonemes by native listeners. Listeners identified either the consonant or the vowel in all possible English CV and VC syllables. The syllables were embedded in multispeaker babble at three signalto-noise ratios (0 dB, 8 dB, and 16 dB). Effects of syllable position, signal-to-noise ratio, and articulatory features on vowel and consonant identification are discussed. The results constitute the largest source of data that is currently available on phoneme confusion patterns of American English phonemes by native listeners.
  • Weber, A., & Melinger, A. (2008). Name dominance in spoken word recognition is (not) modulated by expectations: Evidence from synonyms. In A. Botinis (Ed.), Proceedings of ISCA Tutorial and Research Workshop On Experimental Linguistics (ExLing 2008) (pp. 225-228). Athens: University of Athens.

    Abstract

    Two German eye-tracking experiments tested whether top-down expectations interact with acoustically-driven word-recognition processes. Competitor objects with two synonymous names were paired with target objects whose names shared word onsets with either the dominant or the non-dominant name of the competitor. Non-dominant names of competitor objects were either introduced before the test session or not. Eye-movements were monitored while participants heard instructions to click on target objects. Results demonstrate dominant and non-dominant competitor names were considered for recognition, regardless of top-down expectations, though dominant names were always activated more strongly.
  • Weber, A. (1998). Listening to nonnative language which violates native assimilation rules. In D. Duez (Ed.), Proceedings of the European Scientific Communication Association workshop: Sound patterns of Spontaneous Speech (pp. 101-104).

    Abstract

    Recent studies using phoneme detection tasks have shown that spoken-language processing is neither facilitated nor interfered with by optional assimilation, but is inhibited by violation of obligatory assimilation. Interpretation of these results depends on an assessment of their generality, specifically, whether they also obtain when listeners are processing nonnative language. Two separate experiments are presented in which native listeners of German and native listeners of Dutch had to detect a target fricative in legal monosyllabic Dutch nonwords. All of the nonwords were correct realisations in standard Dutch. For German listeners, however, half of the nonwords contained phoneme strings which violate the German fricative assimilation rule. Whereas the Dutch listeners showed no significant effects, German listeners detected the target fricative faster when the German fricative assimilation was violated than when no violation occurred. The results might suggest that violation of assimilation rules does not have to make processing more difficult per se.
  • Weber, A., & Paris, G. (2004). The origin of the linguistic gender effect in spoken-word recognition: Evidence from non-native listening. In K. Forbus, D. Gentner, & T. Tegier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society. Mahwah, NJ: Erlbaum.

    Abstract

    Two eye-tracking experiments examined linguistic gender effects in non-native spoken-word recognition. French participants, who knew German well, followed spoken instructions in German to click on pictures on a computer screen (e.g., Wo befindet sich die Perle, “where is the pearl”) while their eye movements were monitored. The name of the target picture was preceded by a gender-marked article in the instructions. When a target and a competitor picture (with phonologically similar names) were of the same gender in both German and French, French participants fixated competitor pictures more than unrelated pictures. However, when target and competitor were of the same gender in German but of different gender in French, early fixations to the competitor picture were reduced. Competitor activation in the non-native language was seemingly constrained by native gender information. German listeners showed no such viewing time difference. The results speak against a form-based account of the linguistic gender effect. They rather support the notion that the effect originates from the grammatical level of language processing.
  • Weber, A. (2008). What the eyes can tell us about spoken-language comprehension [Abstract]. Journal of the Acoustical Society of America, 124, 2474-2474.

    Abstract

    Lexical recognition is typically slower in L2 than in L1. Part of the difficulty comes from a not precise enough processing of L2 phonemes. Consequently, L2 listeners fail to eliminate candidate words that L1 listeners can exclude from competing for recognition. For instance, the inability to distinguish /r/ from /l/ in rocket and locker makes for Japanese listeners both words possible candidates when hearing their onset (e.g., Cutler, Weber, and Otake, 2006). The L2 disadvantage can, however, be dispelled: For L2 listeners, but not L1 listeners, L2 speech from a non-native talker with the same language background is known to be as intelligible as L2 speech from a native talker (e.g., Bent and Bradlow, 2003). A reason for this may be that L2 listeners have ample experience with segmental deviations that are characteristic for their own accent. On this account, only phonemic deviations that are typical for the listeners’ own accent will cause spurious lexical activation in L2 listening (e.g., English magic pronounced as megic for Dutch listeners). In this talk, I will present evidence from cross-modal priming studies with a variety of L2 listener groups, showing how the processing of phonemic deviations is accent-specific but withstands fine phonetic differences.
  • Weber, A., & Mueller, K. (2004). Word order variation in German main clauses: A corpus analysis. In Proceedings of the 20th International Conference on Computational Linguistics.

    Abstract

    In this paper, we present empirical data from a corpus study on the linear order of subjects and objects in German main clauses. The aim was to establish the validity of three well-known ordering constraints: given complements tend to occur before new complements, definite before indefinite, and pronoun before full noun phrase complements. Frequencies of occurrences were derived for subject-first and object-first sentences from the German Negra corpus. While all three constraints held on subject-first sentences, results for object-first sentences varied. Our findings suggest an influence of grammatical functions on the ordering of verb complements.
  • Widlok, T. (2006). Two ways of looking at a Mangetti grove. In A. Takada (Ed.), Proceedings of the workshop: Landscape and society (pp. 11-16). Kyoto: 21st Century Center of Excellence Program.
  • Wittek, A. (1998). Learning verb meaning via adverbial modification: Change-of-state verbs in German and the adverb "wieder" again. In A. Greenhill, M. Hughes, H. Littlefield, & H. Walsh (Eds.), Proceedings of the 22nd Annual Boston University Conference on Language Development (pp. 779-790). Somerville, MA: Cascadilla Press.
  • Wittenburg, P. (2004). The IMDI metadata concept. In S. F. Ferreira (Ed.), Workingmaterial on Building the LR&E Roadmap: Joint COCOSDA and ICCWLRE Meeting, (LREC2004). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Brugman, H., Broeder, D., & Russel, A. (2004). XML-based language archiving. In Workshop Proceedings on XML-based Richly Annotaded Corpora (LREC2004) (pp. 63-69). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Gulrajani, G., Broeder, D., & Uneson, M. (2004). Cross-disciplinary integration of metadata descriptions. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004) (pp. 113-116). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: a professional framework for multimodality research. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 1556-1559).

    Abstract

    Utilization of computer tools in linguistic research has gained importance with the maturation of media frameworks for the handling of digital audio and video. The increased use of these tools in gesture, sign language and multimodal interaction studies has led to stronger requirements on the flexibility, the efficiency and in particular the time accuracy of annotation tools. This paper describes the efforts made to make ELAN a tool that meets these requirements, with special attention to the developments in the area of time accuracy. In subsequent sections an overview will be given of other enhancements in the latest versions of ELAN, that make it a useful tool in multimodality research.
  • Wittenburg, P., Johnson, H., Buchhorn, M., Brugman, H., & Broeder, D. (2004). Architecture for distributed language resource management and archiving. In M. Lino, M. Xavier, F. Ferreira, R. Costa, & R. Silva (Eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC2004) (pp. 361-364). Paris: ELRA - European Language Resources Association.
  • Wittenburg, P., Broeder, D., Klein, W., Levinson, S. C., & Romary, L. (2006). Foundations of modern language resource archives. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006) (pp. 625-628).

    Abstract

    A number of serious reasons will convince an increasing amount of researchers to store their relevant material in centers which we will call "language resource archives". They combine the duty of taking care of long-term preservation as well as the task to give access to their material to different user groups. Access here is meant in the sense that an active interaction with the data will be made possible to support the integration of new data, new versions or commentaries of all sort. Modern Language Resource Archives will have to adhere to a number of basic principles to fulfill all requirements and they will have to be involved in federations to create joint language resource domains making it even more simple for the researchers to access the data. This paper makes an attempt to formulate the essential pillars language resource archives have to adhere to.
  • Wolf, M. C., Smith, A. C., Meyer, A. S., & Rowland, C. F. (2019). Modality effects in vocabulary acquisition. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 1212-1218). Montreal, QB: Cognitive Science Society.

    Abstract

    It is unknown whether modality affects the efficiency with which humans learn novel word forms and their meanings, with previous studies reporting both written and auditory advantages. The current study implements controls whose absence in previous work likely offers explanation for such contradictory findings. In two novel word learning experiments, participants were trained and tested on pseudoword - novel object pairs, with controls on: modality of test, modality of meaning, duration of exposure and transparency of word form. In both experiments word forms were presented in either their written or spoken form, each paired with a pictorial meaning (novel object). Following a 20-minute filler task, participants were tested on their ability to identify the picture-word form pairs on which they were trained. A between subjects design generated four participant groups per experiment 1) written training, written test; 2) written training, spoken test; 3) spoken training, written test; 4) spoken training, spoken test. In Experiment 1 the written stimulus was presented for a time period equal to the duration of the spoken form. Results showed that when the duration of exposure was equal, participants displayed a written training benefit. Given words can be read faster than the time taken for the spoken form to unfold, in Experiment 2 the written form was presented for 300 ms, sufficient time to read the word yet 65% shorter than the duration of the spoken form. No modality effect was observed under these conditions, when exposure to the word form was equivalent. These results demonstrate, at least for proficient readers, that when exposure to the word form is controlled across modalities the efficiency with which word form-meaning associations are learnt does not differ. Our results therefore suggest that, although we typically begin as aural-only word learners, we ultimately converge on developing learning mechanisms that learn equally efficiently from both written and spoken materials.
  • Zinn, C., Cablitz, G., Ringersma, J., Kemps-Snijders, M., & Wittenburg, P. (2008). Constructing knowledge spaces from linguistic resources. In Proceedings of the CIL 18 Workshop on Linguistic Studies of Ontology: From lexical semantics to formal ontologies and back.
  • Zinn, C. (2008). Conceptual spaces in ViCoS. In S. Bechhofer, M. Hauswirth, J. Hoffmann, & M. Koubarakis (Eds.), The semantic web: Research and applications (pp. 890-894). Berlin: Springer.

    Abstract

    We describe ViCoS, a tool for constructing and visualising conceptual spaces in the area of language documentation. ViCoS allows users to enrich existing lexical information about the words of a language with conceptual knowledge. Their work towards language-based, informal ontology building must be supported by easy-to-use workflows and supporting software, which we will demonstrate.
  • Zwitserlood, I., Ozyurek, A., & Perniss, P. M. (2008). Annotation of sign and gesture cross-linguistically. In O. Crasborn, E. Efthimiou, T. Hanke, E. D. Thoutenhoofd, & I. Zwitserlood (Eds.), Construction and Exploitation of Sign Language Corpora. 3rd Workshop on the Representation and Processing of Sign Languages (pp. 185-190). Paris: ELDA.

    Abstract

    This paper discusses the construction of a cross-linguistic, bimodal corpus containing three modes of expression: expressions from two sign languages, speech and gestural expressions in two spoken languages and pantomimic expressions by users of two spoken languages who are requested to convey information without speaking. We discuss some problems and tentative solutions for the annotation of utterances expressing spatial information about referents in these three modes, suggesting a set of comparable codes for the description of both sign and gesture. Furthermore, we discuss the processing of entered annotations in ELAN, e.g. relating descriptive annotations to analytic annotations in all three modes and performing relational searches across annotations on different tiers.

Share this page