Publications

Displaying 1 - 100 of 136
  • Alhama, R. G., Rowland, C. F., & Kidd, E. (2020). Evaluating word embeddings for language acquisition. In E. Chersoni, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 38-42). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL). doi:10.18653/v1/2020.cmcl-1.4.

    Abstract

    Continuous vector word representations (or
    word embeddings) have shown success in cap-turing semantic relations between words, as evidenced by evaluation against behavioral data of adult performance on semantic tasks (Pereira et al., 2016). Adult semantic knowl-edge is the endpoint of a language acquisition process; thus, a relevant question is whether these models can also capture emerging word
    representations of young language learners. However, the data for children’s semantic knowledge across development is scarce. In this paper, we propose to bridge this gap by using Age of Acquisition norms to evaluate word embeddings learnt from child-directed input. We present two methods that evaluate word embeddings in terms of (a) the semantic neighbourhood density of learnt words, and (b) con-
    vergence to adult word associations. We apply our methods to bag-of-words models, and find that (1) children acquire words with fewer semantic neighbours earlier, and (2) young learners only attend to very local context. These findings provide converging evidence for validity of our methods in understanding the prerequisite features for a distributional model of word learning.
  • Alhama, R. G., Siegelman, N., Frost, R., & Armstrong, B. C. (2019). The role of information in visual word recognition: A perceptually-constrained connectionist account. In A. Goel, C. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 83-89). Austin, TX: Cognitive Science Society.

    Abstract

    Proficient readers typically fixate near the center of a word, with a slight bias towards word onset. We explore a novel account of this phenomenon based on combining information-theory with visual perceptual constraints in a connectionist model of visual word recognition. This account posits that the amount of information-content available for word identification varies across fixation locations and across languages, thereby explaining the overall fixation location bias in different languages, making the novel prediction that certain words are more readily identified when fixating at an atypical fixation location, and predicting specific cross-linguistic differences. We tested these predictions across several simulations in English and Hebrew, and in a pilot behavioral experiment. Results confirmed that the bias to fixate closer to word onset aligns with maximizing information in the visual signal, that some words are more readily identified at atypical fixation locations, and that these effects vary to some degree across languages.
  • Allerhand, M., Butterfield, S., Cutler, A., & Patterson, R. (1992). Assessing syllable strength via an auditory model. In Proceedings of the Institute of Acoustics: Vol. 14 Part 6 (pp. 297-304). St. Albans, Herts: Institute of Acoustics.
  • Almeida, L., Amdal, I., Beires, N., Boualem, M., Boves, L., Den Os, E., Filoche, P., Gomes, R., Knudsen, J. E., Kvale, K., Rugelbak, J., Tallec, C., & Warakagoda, N. (2002). Implementing and evaluating a multimodal tourist guide. In J. v. Kuppevelt, L. Dybkjær, & N. Bernsen (Eds.), Proceedings of the International CLASS Workshop on Natural, Intelligent and Effective Interaction in Multimodal Dialogue System (pp. 1-7). Copenhagen: Kluwer.
  • Asano, Y., Yuan, C., Grohe, A.-K., Weber, A., Antoniou, M., & Cutler, A. (2020). Uptalk interpretation as a function of listening experience. In N. Minematsu, M. Kondo, T. Arai, & R. Hayashi (Eds.), Proceedings of Speech Prosody 2020 (pp. 735-739). Tokyo: ISCA. doi:10.21437/SpeechProsody.2020-150.

    Abstract

    The term “uptalk” describes utterance-final pitch rises that carry no sentence-structural information. Uptalk is usually dialectal or sociolectal, and Australian English (AusEng) is particularly known for this attribute. We ask here whether experience with an uptalk variety affects listeners’ ability to categorise rising pitch contours on the basis of the timing and height of their onset and offset. Listeners were two groups of English-speakers (AusEng, and American English), and three groups of listeners with L2 English: one group with Mandarin as L1 and experience of listening to AusEng, one with German as L1 and experience of listening to AusEng, and one with German as L1 but no AusEng experience. They heard nouns (e.g. flower, piano) in the framework “Got a NOUN”, each ending with a pitch rise artificially manipulated on three contrasts: low vs. high rise onset, low vs. high rise offset and early vs. late rise onset. Their task was to categorise the tokens as “question” or “statement”, and we analysed the effect of the pitch contrasts on their judgements. Only the native AusEng listeners were able to use the pitch contrasts systematically in making these categorisations.
  • Badimala, P., Mishra, C., Venkataramana, R. K. M., Bukhari, S. S., & Dengel, A. (2019). A Study of Various Text Augmentation Techniques for Relation Classification in Free Text. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (pp. 360-367). Setúbal, Portugal: SciTePress Digital Library. doi:10.5220/0007311003600367.

    Abstract

    Data augmentation techniques have been widely used in visual recognition tasks as it is easy to generate new
    data by simple and straight forward image transformations. However, when it comes to text data augmen-
    tations, it is difficult to find appropriate transformation techniques which also preserve the contextual and
    grammatical structure of language texts. In this paper, we explore various text data augmentation techniques
    in text space and word embedding space. We study the effect of various augmented datasets on the efficiency
    of different deep learning models for relation classification in text.
  • Bentum, M., Ten Bosch, L., Van den Bosch, A., & Ernestus, M. (2019). Listening with great expectations: An investigation of word form anticipations in naturalistic speech. In Proceedings of Interspeech 2019 (pp. 2265-2269). doi:10.21437/Interspeech.2019-2741.

    Abstract

    The event-related potential (ERP) component named phonological mismatch negativity (PMN) arises when listeners hear an unexpected word form in a spoken sentence [1]. The PMN is thought to reflect the mismatch between expected and perceived auditory speech input. In this paper, we use the PMN to test a central premise in the predictive coding framework [2], namely that the mismatch between prior expectations and sensory input is an important mechanism of perception. We test this with natural speech materials containing approximately 50,000 word tokens. The corresponding EEG-signal was recorded while participants (n = 48) listened to these materials. Following [3], we quantify the mismatch with two word probability distributions (WPD): a WPD based on preceding context, and a WPD that is additionally updated based on the incoming audio of the current word. We use the between-WPD cross entropy for each word in the utterances and show that a higher cross entropy correlates with a more negative PMN. Our results show that listeners anticipate auditory input while processing each word in naturalistic speech. Moreover, complementing previous research, we show that predictive language processing occurs across the whole probability spectrum.
  • Bentum, M., Ten Bosch, L., Van den Bosch, A., & Ernestus, M. (2019). Quantifying expectation modulation in human speech processing. In Proceedings of Interspeech 2019 (pp. 2270-2274). doi:10.21437/Interspeech.2019-2685.

    Abstract

    The mismatch between top-down predicted and bottom-up perceptual input is an important mechanism of perception according to the predictive coding framework (Friston, [1]). In this paper we develop and validate a new information-theoretic measure that quantifies the mismatch between expected and observed auditory input during speech processing. We argue that such a mismatch measure is useful for the study of speech processing. To compute the mismatch measure, we use naturalistic speech materials containing approximately 50,000 word tokens. For each word token we first estimate the prior word probability distribution with the aid of statistical language modelling, and next use automatic speech recognition to update this word probability distribution based on the unfolding speech signal. We validate the mismatch measure with multiple analyses, and show that the auditory-based update improves the probability of the correct word and lowers the uncertainty of the word probability distribution. Based on these results, we argue that it is possible to explicitly estimate the mismatch between predicted and perceived speech input with the cross entropy between word expectations computed before and after an auditory update.
  • De Boer, B., Thompson, B., Ravignani, A., & Boeckx, C. (2020). Analysis of mutation and fixation for language. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 56-58). Nijmegen: The Evolution of Language Conferences.
  • Bowerman, M., Brown, P., Eisenbeiss, S., Narasimhan, B., & Slobin, D. I. (2002). Putting things in places: Developmental consequences of linguistic typology. In E. V. Clark (Ed.), Proceedings of the 31st Stanford Child Language Research Forum. Space in language location, motion, path, and manner (pp. 1-29). Stanford: Center for the Study of Language & Information.

    Abstract

    This study explores how adults and children describe placement events (e.g., putting a book on a table) in a range of different languages (Finnish, English, German, Russian, Hindi, Tzeltal Maya, Spanish, and Turkish). Results show that the eight languages grammatically encode placement events in two main ways (Talmy, 1985, 1991), but further investigation reveals fine-grained crosslinguistic variation within each of the two groups. Children are sensitive to these finer-grained characteristics of the input language at an early age, but only when such features are perceptually salient. Our study demonstrates that a unitary notion of 'event' does not suffice to characterize complex but systematic patterns of event encoding crosslinguistically, and that children are sensitive to multiple influences, including the distributional properties of the target language, in constructing these patterns in their own speech.
  • Brehm, L., Jackson, C. N., & Miller, K. L. (2019). Incremental interpretation in the first and second language. In M. Brown, & B. Dailey (Eds.), BUCLD 43: Proceedings of the 43rd annual Boston University Conference on Language Development (pp. 109-122). Sommerville, MA: Cascadilla Press.
  • Broeder, D., Offenga, F., & Willems, D. (2002). Metadata tools supporting controlled vocabulary services. In M. Rodriguez González, & C. Paz SuárezR Araujo (Eds.), Third international conference on language resources and evaluation (pp. 1055-1059). Paris: European Language Resources Association.

    Abstract

    Within the ISLE Metadata Initiative (IMDI) project a user-friendly editor to enter metadata descriptions and a browser operating on the linked metadata descriptions were developed. Both tools support the usage of Controlled Vocabulary (CV) repositories by means of the specification of an URL where the formal CV definition data is available.
  • Broeder, D., Wittenburg, P., Declerck, T., & Romary, L. (2002). LREP: A language repository exchange protocol. In M. Rodriguez González, & C. Paz Suárez Araujo (Eds.), Third international conference on language resources and evaluation (pp. 1302-1305). Paris: European Language Resources Association.

    Abstract

    The recent increase in the number and complexity of the language resources available on the Internet is followed by a similar increase of available tools for linguistic analysis. Ideally the user does not need to be confronted with the question in how to match tools with resources. If resource repositories and tool repositories offer adequate metadata information and a suitable exchange protocol is developed this matching process could be performed (semi-) automatically.
  • Broersma, M. (2002). Comprehension of non-native speech: Inaccurate phoneme processing and activation of lexical competitors. In ICSLP-2002 (pp. 261-264). Denver: Center for Spoken Language Research, U. of Colorado Boulder.

    Abstract

    Native speakers of Dutch with English as a second language and native speakers of English participated in an English lexical decision experiment. Phonemes in real words were replaced by others from which they are hard to distinguish for Dutch listeners. Non-native listeners judged the resulting near-words more often as a word than native listeners. This not only happened when the phonemes that were exchanged did not exist as separate phonemes in the native language Dutch, but also when phoneme pairs that do exist in Dutch were used in word-final position, where they are not distinctive in Dutch. In an English bimodal priming experiment with similar groups of participants, word pairs were used which differed in one phoneme. These phonemes were hard to distinguish for the non-native listeners. Whereas in native listening both words inhibited each other, in non-native listening presentation of one word led to unresolved competition between both words. The results suggest that inaccurate phoneme processing by non-native listeners leads to the activation of spurious lexical competitors.
  • Bruggeman, L., & Cutler, A. (2019). The dynamics of lexical activation and competition in bilinguals’ first versus second language. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1342-1346). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Speech input causes listeners to activate multiple
    candidate words which then compete with one
    another. These include onset competitors, that share a
    beginning (bumper, butter), but also, counterintuitively,
    rhyme competitors, sharing an ending
    (bumper, jumper). In L1, competition is typically
    stronger for onset than for rhyme. In L2, onset
    competition has been attested but rhyme competition
    has heretofore remained largely unexamined. We
    assessed L1 (Dutch) and L2 (English) word
    recognition by the same late-bilingual individuals. In
    each language, eye gaze was recorded as listeners
    heard sentences and viewed sets of drawings: three
    unrelated, one depicting an onset or rhyme competitor
    of a word in the input. Activation patterns revealed
    substantial onset competition but no significant
    rhyme competition in either L1 or L2. Rhyme
    competition may thus be a “luxury” feature of
    maximally efficient listening, to be abandoned when
    resources are scarcer, as in listening by late
    bilinguals, in either language.
  • Brugman, H., Levinson, S. C., Skiba, R., & Wittenburg, P. (2002). The DOBES archive: It's purpose and implementation. In P. Austin, H. Dry, & P. Wittenburg (Eds.), Proceedings of the international LREC workshop on resources and tools in field linguistics (pp. 11-11). Paris: European Language Resources Association.
  • Brugman, H., Spenke, H., Kramer, M., & Klassmann, A. (2002). Multimedia annotation with multilingual input methods and search support.
  • Brugman, H., Wittenburg, P., Levinson, S. C., & Kita, S. (2002). Multimodal annotations in gesture and sign language studies. In M. Rodriguez González, & C. Paz Suárez Araujo (Eds.), Third international conference on language resources and evaluation (pp. 176-182). Paris: European Language Resources Association.

    Abstract

    For multimodal annotations an exhaustive encoding system for gestures was developed to facilitate research. The structural requirements of multimodal annotations were analyzed to develop an Abstract Corpus Model which is the basis for a powerful annotation and exploitation tool for multimedia recordings and the definition of the XML-based EUDICO Annotation Format. Finally, a metadata-based data management environment has been setup to facilitate resource discovery and especially corpus management. Bt means of an appropriate digitization policy and their online availability researchers have been able to build up a large corpus covering gesture and sign language data.
  • Cablitz, G. (2002). The acquisition of an absolute system: learning to talk about space in Marquesan (Oceanic, French Polynesia). In E. V. Clark (Ed.), Space in language location, motion, path, and manner (pp. 40-49). Stanford: Center for the Study of Language & Information (Electronic proceedings.
  • Chen, A., Gussenhoven, C., & Rietveld, T. (2002). Language-specific uses of the effort code. In B. Bel, & I. Marlien (Eds.), Proceedings of the 1st Conference on Speech Prosody (pp. 215-218). Aix=en-Provence: Université de Provence.

    Abstract

    Two groups of listeners with Dutch and British English language backgrounds judged Dutch and British English utterances, respectively, which varied in the intonation contour on the scales EMPHATIC vs. NOT EMPHATIC and SURPRISED vs. NOT SURPRISED, two meanings derived from the Effort Code. The stimuli, which differed in sentence mode but were otherwise lexically equivalent, were varied in peak height, peak alignment, end pitch, and overall register. In both languages, there are positive correlations between peak height and degree of emphasis, between peak height and degree of surprise, between peak alignment and degree of surprise, and between pitch register and degree of surprise. However, in all these cases, Dutch stimuli lead to larger perceived meaning differences than the British English stimuli. This difference in the extent to which increased pitch height triggers increases in perceived emphasis and surprise is argued to be due to the difference in the standard pitch ranges between Dutch and British English. In addition, we found a positive correlation between pitch register and the degree of emphasis in Dutch, but a negative correlation in British English. This is an unexpected difference, which illustrates a case of ambiguity in the meaning of pitch.
  • Cutler, A., McQueen, J. M., Jansonius, M., & Bayerl, S. (2002). The lexical statistics of competitor activation in spoken-word recognition. In C. Bow (Ed.), Proceedings of the 9th Australian International Conference on Speech Science and Technology (pp. 40-45). Canberra: Australian Speech Science and Technology Association (ASSTA).

    Abstract

    The Possible Word Constraint is a proposed mechanism whereby listeners avoid recognising words spuriously embedded in other words. It applies to words leaving a vowelless residue between their edge and the nearest known word or syllable boundary. The present study tests the usefulness of this constraint via lexical statistics of both English and Dutch. The analyses demonstrate that the constraint removes a clear majority of embedded words in speech, and thus can contribute significantly to the efficiency of human speech recognition
  • Cutler, A., Burchfield, A., & Antoniou, M. (2019). A criterial interlocutor tally for successful talker adaptation? In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1485-1489). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Part of the remarkable efficiency of listening is
    accommodation to unfamiliar talkers’ specific
    pronunciations by retuning of phonemic intercategory
    boundaries. Such retuning occurs in second
    (L2) as well as first language (L1); however, recent
    research with emigrés revealed successful adaptation
    in the environmental L2 but, unprecedentedly, not in
    L1 despite continuing L1 use. A possible explanation
    involving relative exposure to novel talkers is here
    tested in heritage language users with Mandarin as
    family L1 and English as environmental language. In
    English, exposure to an ambiguous sound in
    disambiguating word contexts prompted the expected
    adjustment of phonemic boundaries in subsequent
    categorisation. However, no adjustment occurred in
    Mandarin, again despite regular use. Participants
    reported highly asymmetric interlocutor counts in the
    two languages. We conclude that successful retuning
    ability requires regular exposure to novel talkers in
    the language in question, a criterion not met for the
    emigrés’ or for these heritage users’ L1.
  • Cutler, A. (1987). Components of prosodic effects in speech recognition. In Proceedings of the Eleventh International Congress of Phonetic Sciences: Vol. 1 (pp. 84-87). Tallinn: Academy of Sciences of the Estonian SSR, Institute of Language and Literature.

    Abstract

    Previous research has shown that listeners use the prosodic structure of utterances in a predictive fashion in sentence comprehension, to direct attention to accented words. Acoustically identical words spliced into sentence contexts arc responded to differently if the prosodic structure of the context is \ aricd: when the preceding prosody indicates that the word will he accented, responses are faster than when the preceding prosodv is inconsistent with accent occurring on that word. In the present series of experiments speech hybridisation techniques were first used to interchange the timing patterns within pairs of prosodic variants of utterances, independently of the pitch and intensity contours. The time-adjusted utterances could then serve as a basis lor the orthogonal manipulation of the three prosodic dimensions of pilch, intensity and rhythm. The overall pattern of results showed that when listeners use prosody to predict accent location, they do not simply rely on a single prosodic dimension, hut exploit the interaction between pitch, intensity and rhythm.
  • Cutler, A., Kearns, R., Norris, D., & Scott, D. (1992). Listeners’ responses to extraneous signals coincident with English and French speech. In J. Pittam (Ed.), Proceedings of the 4th Australian International Conference on Speech Science and Technology (pp. 666-671). Canberra: Australian Speech Science and Technology Association.

    Abstract

    English and French listeners performed two tasks - click location and speeded click detection - with both English and French sentences, closely matched for syntactic and phonological structure. Clicks were located more accurately in open- than in closed-class words in both English and French; they were detected more rapidly in open- than in closed-class words in English, but not in French. The two listener groups produced the same pattern of responses, suggesting that higher-level linguistic processing was not involved in these tasks.
  • Cutler, A., & Robinson, T. (1992). Response time as a metric for comparison of speech recognition by humans and machines. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing: Vol. 1 (pp. 189-192). Alberta: University of Alberta.

    Abstract

    The performance of automatic speech recognition systems is usually assessed in terms of error rate. Human speech recognition produces few errors, but relative difficulty of processing can be assessed via response time techniques. We report the construction of a measure analogous to response time in a machine recognition system. This measure may be compared directly with human response times. We conducted a trial comparison of this type at the phoneme level, including both tense and lax vowels and a variety of consonant classes. The results suggested similarities between human and machine processing in the case of consonants, but differences in the case of vowels.
  • Cutler, A., & Carter, D. (1987). The prosodic structure of initial syllables in English. In J. Laver, & M. Jack (Eds.), Proceedings of the European Conference on Speech Technology: Vol. 1 (pp. 207-210). Edinburgh: IEE.
  • Dideriksen, C., Fusaroli, R., Tylén, K., Dingemanse, M., & Christiansen, M. H. (2019). Contextualizing Conversational Strategies: Backchannel, Repair and Linguistic Alignment in Spontaneous and Task-Oriented Conversations. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci 2019) (pp. 261-267). Montreal, QB: Cognitive Science Society.

    Abstract

    Do interlocutors adjust their conversational strategies to the specific contextual demands of a given situation? Prior studies have yielded conflicting results, making it unclear how strategies vary with demands. We combine insights from qualitative and quantitative approaches in a within-participant experimental design involving two different contexts: spontaneously occurring conversations (SOC) and task-oriented conversations (TOC). We systematically assess backchanneling, other-repair and linguistic alignment. We find that SOC exhibit a higher number of backchannels, a reduced and more generic repair format and higher rates of lexical and syntactic alignment. TOC are characterized by a high number of specific repairs and a lower rate of lexical and syntactic alignment. However, when alignment occurs, more linguistic forms are aligned. The findings show that conversational strategies adapt to specific contextual demands.
  • Dieuleveut, A., Van Dooren, A., Cournane, A., & Hacquard, V. (2019). Acquiring the force of modals: Sig you guess what sig means? In M. Brown, & B. Dailey (Eds.), BUCLD 43: Proceedings of the 43rd annual Boston University Conference on Language Development (pp. 189-202). Sommerville, MA: Cascadilla Press.
  • Dimroth, C., & Lasser, I. (Eds.). (2002). Finite options: How L1 and L2 learners cope with the acquisition of finiteness [Special Issue]. Linguistics, 40(4).
  • Doumas, L. A. A., Martin, A. E., & Hummel, J. E. (2020). Relation learning in a neurocomputational architecture supports cross-domain transfer. In S. Denison, M. Mack, Y. Xu, & B. C. Armstrong (Eds.), Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society (CogSci 2020) (pp. 932-937). Montreal, QB: Cognitive Science Society.

    Abstract

    Humans readily generalize, applying prior knowledge to novel situations and stimuli. Advances in machine learning have begun to approximate and even surpass human performance, but these systems struggle to generalize what they have learned to untrained situations. We present a model based on wellestablished neurocomputational principles that demonstrates human-level generalisation. This model is trained to play one video game (Breakout) and performs one-shot generalisation to a new game (Pong) with different characteristics. The model
    generalizes because it learns structured representations that are functionally symbolic (viz., a role-filler binding calculus) from unstructured training data. It does so without feedback, and without requiring that structured representations are specified a priori. Specifically, the model uses neural co-activation to discover which characteristics of the input are invariant and to learn relational predicates, and oscillatory regularities in network firing to bind predicates to arguments. To our knowledge,
    this is the first demonstration of human-like generalisation in a machine system that does not assume structured representa-
    tions to begin with.
  • Eijk, L., Ernestus, M., & Schriefers, H. (2019). Alignment of pitch and articulation rate. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 2690-2694). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Previous studies have shown that speakers align their speech to each other at multiple linguistic levels. This study investigates whether alignment is mostly the result of priming from the immediately preceding
    speech materials, focussing on pitch and articulation rate (AR). Native Dutch speakers completed sentences, first by themselves (pre-test), then in alternation with Confederate 1 (Round 1), with Confederate 2 (Round 2), with Confederate 1 again
    (Round 3), and lastly by themselves again (post-test). Results indicate that participants aligned to the confederates and that this alignment lasted during the post-test. The confederates’ directly preceding sentences were not good predictors for the participants’ pitch and AR. Overall, the results indicate that alignment is more of a global effect than a local priming effect.
  • Enfield, N. J. (2002). Parallel innovation and 'coincidence' in linguistic areas: On a bi-clausal extent/result constructions of mainland Southeast Asia. In P. Chew (Ed.), Proceedings of the 28th meeting of the Berkeley Linguistics Society. Special session on Tibeto-Burman and Southeast Asian linguistics (pp. 121-128). Berkeley: Berkeley Linguistics Society.
  • Ergin, R., Raviv, L., Senghas, A., Padden, C., & Sandler, W. (2020). Community structure affects convergence on uniform word orders: Evidence from emerging sign languages. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 84-86). Nijmegen: The Evolution of Language Conferences.
  • Felker, E. R., Ernestus, M., & Broersma, M. (2019). Evaluating dictation task measures for the study of speech perception. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 383-387). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    This paper shows that the dictation task, a well-
    known testing instrument in language education, has
    untapped potential as a research tool for studying
    speech perception. We describe how transcriptions
    can be scored on measures of lexical, orthographic,
    phonological, and semantic similarity to target
    phrases to provide comprehensive information about
    accuracy at different processing levels. The former
    three measures are automatically extractable,
    increasing objectivity, and the middle two are
    gradient, providing finer-grained information than
    traditionally used. We evaluate the measures in an
    English dictation task featuring phonetically reduced
    continuous speech. Whereas the lexical and
    orthographic measures emphasize listeners’ word
    identification difficulties, the phonological measure
    demonstrates that listeners can often still recover
    phonological features, and the semantic measure
    captures their ability to get the gist of the utterances.
    Correlational analyses and a discussion of practical
    and theoretical considerations show that combining
    multiple measures improves the dictation task’s
    utility as a research tool.
  • Felker, E. R., Ernestus, M., & Broersma, M. (2019). Lexically guided perceptual learning of a vowel shift in an interactive L2 listening context. In Proceedings of Interspeech 2019 (pp. 3123-3127). doi:10.21437/Interspeech.2019-1414.

    Abstract

    Lexically guided perceptual learning has traditionally been studied with ambiguous consonant sounds to which native listeners are exposed in a purely receptive listening context. To extend previous research, we investigate whether lexically guided learning applies to a vowel shift encountered by non-native listeners in an interactive dialogue. Dutch participants played a two-player game in English in either a control condition, which contained no evidence for a vowel shift, or a lexically constraining condition, in which onscreen lexical information required them to re-interpret their interlocutor’s /ɪ/ pronunciations as representing /ε/. A phonetic categorization pre-test and post-test were used to assess whether the game shifted listeners’ phonemic boundaries such that more of the /ε/-/ɪ/ continuum came to be perceived as /ε/. Both listener groups showed an overall post-test shift toward /ɪ/, suggesting that vowel perception may be sensitive to directional biases related to properties of the speaker’s vowel space. Importantly, listeners in the lexically constraining condition made relatively more post-test /ε/ responses than the control group, thereby exhibiting an effect of lexically guided adaptation. The results thus demonstrate that non-native listeners can adjust their phonemic boundaries on the basis of lexical information to accommodate a vowel shift learned in interactive conversation.
  • Fisher, S. E., & Tilot, A. K. (Eds.). (2019). Bridging senses: Novel insights from synaesthesia [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 374.
  • Friederici, A., & Levelt, W. J. M. (1987). Spatial description in microgravity: Aspects of cognitive adaptation. In P. R. Sahm, R. Jansen, & M. Keller (Eds.), Proceedings of the Norderney Symposium on Scientific Results of the German Spacelab Mission D1 (pp. 518-524). Köln, Germany: Wissenschaftliche Projektführung DI c/o DFVLR.
  • Frost, R. L. A., Isbilen, E. S., Christiansen, M. H., & Monaghan, P. (2019). Testing the limits of non-adjacent dependency learning: Statistical segmentation and generalisation across domains. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 1787-1793). Montreal, QB: Cognitive Science Society.

    Abstract

    Achieving linguistic proficiency requires identifying words from speech, and discovering the constraints that govern the way those words are used. In a recent study of non-adjacent dependency learning, Frost and Monaghan (2016) demonstrated that learners may perform these tasks together, using similar statistical processes - contrary to prior suggestions. However, in their study, non-adjacent dependencies were marked by phonological cues (plosive-continuant-plosive structure), which may have influenced learning. Here, we test the necessity of these cues by comparing learning across three conditions; fixed phonology, which contains these cues, varied phonology, which omits them, and shapes, which uses visual shape sequences to assess the generality of statistical processing for these tasks. Participants segmented the sequences and generalized the structure in both auditory conditions, but learning was best when phonological cues were present. Learning was around chance on both tasks for the visual shapes group, indicating statistical processing may critically differ across domains.
  • Galke, L., Vagliano, I., & Scherp, A. (2019). Can graph neural networks go „online“? An analysis of pretraining and inference. In Proceedings of the Representation Learning on Graphs and Manifolds: ICLR2019 Workshop.

    Abstract

    Large-scale graph data in real-world applications is often not static but dynamic,
    i. e., new nodes and edges appear over time. Current graph convolution approaches
    are promising, especially, when all the graph’s nodes and edges are available dur-
    ing training. When unseen nodes and edges are inserted after training, it is not
    yet evaluated whether up-training or re-training from scratch is preferable. We
    construct an experimental setup, in which we insert previously unseen nodes and
    edges after training and conduct a limited amount of inference epochs. In this
    setup, we compare adapting pretrained graph neural networks against retraining
    from scratch. Our results show that pretrained models yield high accuracy scores
    on the unseen nodes and that pretraining is preferable over retraining from scratch.
    Our experiments represent a first step to evaluate and develop truly online variants
    of graph neural networks.
  • Galke, L., Melnychuk, T., Seidlmayer, E., Trog, S., Foerstner, K., Schultz, C., & Tochtermann, K. (2019). Inductive learning of concept representations from library-scale bibliographic corpora. In K. David, K. Geihs, M. Lange, & G. Stumme (Eds.), Informatik 2019: 50 Jahre Gesellschaft für Informatik - Informatik für Gesellschaft (pp. 219-232). Bonn: Gesellschaft für Informatik e.V. doi:10.18420/inf2019_26.
  • Goldrick, M., Brehm, L., Pyeong Whan, C., & Smolensky, P. (2019). Transient blend states and discrete agreement-driven errors in sentence production. In G. J. Snover, M. Nelson, B. O'Connor, & J. Pater (Eds.), Proceedings of the Society for Computation in Linguistics (SCiL 2019) (pp. 375-376). doi:10.7275/n0b2-5305.
  • Guirardello-Damian, R., & Skiba, R. (2002). Trumai Corpus: An example of presenting multi-media data in the IMDI-browser. In P. Austin, H. Dry, & P. Wittenburg (Eds.), Proceedings of the international LREC workshop on resources and tools in field linguistics (pp. 16-1-16-8). Paris: European Language Resources Association.

    Abstract

    Trumai, a genetically isolated language spoken in Brazil (Xingu reserve), is an example of an endangered language. Although the Trumai population consists of more than 100 individuals, only 51 people speak the language. The oral traditions are progressively dying. Given the current scenario, the documentation of this language and its cultural aspects is of great importance. In the framework of the DoBeS program (Documentation of Endangered Languages), the project "Documentation of Trumai" has selected and organized a collection of Trumai texts, with a multi-media representation of the corpus. Several kinds of information and data types are being included in the archive of the language: texts with audio and video recordings; written texts from educational materials; drawings; photos; songs; annotations in different formats; lexicon; field notes; results from scientific studies of the language (sound system, sketch grammar, comparative studies with other Xinguan languages), etc. All materials are integrated into the IMDI-Browser, a specialized tool for presenting and searching for linguistic data. This paper explores the processing phases and the results of the Trumai project taking into consideration the issue of how to combine the needs and wishes of field linguistics (content and research aspects) and the needs of archiving (structure and workflow aspects) in a well-organized corpus.
  • Gulrajani, G., & Harrison, D. (2002). SHAWEL: Sharable and interactive web-lexicons. In P. Austin, H. Dry, & P. Wittenburg (Eds.), Proceedings of the international LREC workshop on resources and tools in field linguistics (pp. 9-1-9-4). Paris: European Language Resources Association.

    Abstract

    A prototypical lexicon tool was implemented which was intended to allow researchers to collaboratively create lexicons of endangered languages. Increasingly often researchers documenting or analyzing a language work at different locations. Lexicons that evolve through continuous interaction between the collaborators can only be efficiently produced when it can be accessed and manipulated via the Internet. The SHAWEL tool was developed to address these needs; it makes use of a thin Java client and a central database solution.
  • Hahn, L. E., Ten Buuren, M., De Nijs, M., Snijders, T. M., & Fikkert, P. (2019). Acquiring novel words in a second language through mutual play with child songs - The Noplica Energy Center. In L. Nijs, H. Van Regenmortel, & C. Arculus (Eds.), MERYC19 Counterpoints of the senses: Bodily experiences in musical learning (pp. 78-87). Ghent, Belgium: EuNet MERYC 2019.

    Abstract

    Child songs are a great source for linguistic learning. Here we explore whether children can acquire novel words in a second language by playing a game featuring child songs in a playhouse. We present data from three studies that serve as scientific proof for the functionality of one game of the playhouse: the Energy Center. For this game, three hand-bikes were mounted on a panel. When children start moving the hand-bikes, child songs start playing simultaneously. Once the children produce enough energy with the hand-bikes, the songs are additionally accompanied with the sounds of musical instruments. In our studies, children executed a picture-selection task to evaluate whether they acquired new vocabulary from the songs presented during the game. Two of our studies were run in the field, one at a Dutch and one at an Indian pre-school. The third study features data from a more controlled laboratory setting. Our results partly confirm that the Energy Center is a successful means to support vocabulary acquisition in a second language. More research with larger sample sizes and longer access to the Energy Center is needed to evaluate the overall functionality of the game. Based on informal observations at our test sites, however, we are certain that children do pick up linguistic content from the songs during play, as many of the children repeat words and phrases from songs they heard. We will pick up upon these promising observations during future studies
  • Harbusch, K., & Kempen, G. (2002). A quantitative model of word order and movement in English, Dutch and German complement constructions. In Proceedings of the 19th international conference on Computational linguistics. San Francisco: Morgan Kaufmann.

    Abstract

    We present a quantitative model of word order and movement constraints that enables a simple and uniform treatment of a seemingly heterogeneous collection of linear order phenomena in English, Dutch and German complement constructions (Wh-extraction, clause union, extraposition, verb clustering, particle movement, etc.). Underlying the scheme are central assumptions of the psycholinguistically motivated Performance Grammar (PG). Here we describe this formalism in declarative terms based on typed feature unification. PG allows a homogenous treatment of both the within- and between-language variations of the ordering phenomena under discussion, which reduce to different settings of a small number of quantitative parameters.
  • Harmon, Z., & Kapatsinski, V. (2020). The best-laid plan of mice and men: Competition between top-down and preceding-item cues in plan execution. In S. Denison, M. Mack, Y. Xu, & B. C. Armstrong (Eds.), Proceedings of the 42nd Annual Meeting of the Cognitive Science Society (CogSci 2020) (pp. 1674-1680). Montreal, QB: Cognitive Science Society.

    Abstract

    There is evidence that the process of executing a planned utterance involves the use of both preceding-context and top-down cues. Utterance-initial words are cued only by the top-down plan. In contrast, non-initial words are cued both by top-down cues and preceding-context cues. Co-existence of both cue types raises the question of how they interact during learning. We argue that this interaction is competitive: items that tend to be preceded by predictive preceding-context cues are harder to activate from the plan without this predictive context. A novel computational model of this competition is developed. The model is tested on a corpus of repetition disfluencies and shown to account for the influences on patterns of restarts during production. In particular, this model predicts a novel Initiation Effect: following an interruption, speakers re-initiate production from words that tend to occur in utterance-initial position, even when they are not initial in the interrupted utterance.
  • Hashemzadeh, M., Kaufeld, G., White, M., Martin, A. E., & Fyshe, A. (2020). From language to language-ish: How brain-like is an LSTM representation of nonsensical language stimuli? In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 645-655). Association for Computational Linguistics.

    Abstract

    The representations generated by many mod-
    els of language (word embeddings, recurrent
    neural networks and transformers) correlate
    to brain activity recorded while people read.
    However, these decoding results are usually
    based on the brain’s reaction to syntactically
    and semantically sound language stimuli. In
    this study, we asked: how does an LSTM (long
    short term memory) language model, trained
    (by and large) on semantically and syntac-
    tically intact language, represent a language
    sample with degraded semantic or syntactic
    information? Does the LSTM representation
    still resemble the brain’s reaction? We found
    that, even for some kinds of nonsensical lan-
    guage, there is a statistically significant rela-
    tionship between the brain’s activity and the
    representations of an LSTM. This indicates
    that, at least in some instances, LSTMs and the
    human brain handle nonsensical data similarly.
  • De Heer Kloots, M., Carlson, D., Garcia, M., Kotz, S., Lowry, A., Poli-Nardi, L., de Reus, K., Rubio-García, A., Sroka, M., Varola, M., & Ravignani, A. (2020). Rhythmic perception, production and interactivity in harbour and grey seals. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 59-62). Nijmegen: The Evolution of Language Conferences.
  • Heilbron, M., Ehinger, B., Hagoort, P., & De Lange, F. P. (2019). Tracking naturalistic linguistic predictions with deep neural language models. In Proceedings of the 2019 Conference on Cognitive Computational Neuroscience (pp. 424-427). doi:10.32470/CCN.2019.1096-0.

    Abstract

    Prediction in language has traditionally been studied using
    simple designs in which neural responses to expected
    and unexpected words are compared in a categorical
    fashion. However, these designs have been contested
    as being ‘prediction encouraging’, potentially exaggerating
    the importance of prediction in language understanding.
    A few recent studies have begun to address
    these worries by using model-based approaches to probe
    the effects of linguistic predictability in naturalistic stimuli
    (e.g. continuous narrative). However, these studies
    so far only looked at very local forms of prediction, using
    models that take no more than the prior two words into
    account when computing a word’s predictability. Here,
    we extend this approach using a state-of-the-art neural
    language model that can take roughly 500 times longer
    linguistic contexts into account. Predictability estimates
    fromthe neural network offer amuch better fit to EEG data
    from subjects listening to naturalistic narrative than simpler
    models, and reveal strong surprise responses akin to
    the P200 and N400. These results show that predictability
    effects in language are not a side-effect of simple designs,
    and demonstrate the practical use of recent advances
    in AI for the cognitive neuroscience of language.
  • Hoeksema, N., Villanueva, S., Mengede, J., Salazar-Casals, A., Rubio-García, A., Curcic-Blake, B., Vernes, S. C., & Ravignani, A. (2020). Neuroanatomy of the grey seal brain: Bringing pinnipeds into the neurobiological study of vocal learning. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 162-164). Nijmegen: The Evolution of Language Conferences.
  • Hoeksema, N., Wiesmann, M., Kiliaan, A., Hagoort, P., & Vernes, S. C. (2020). Bats and the comparative neurobiology of vocal learning. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 165-167). Nijmegen: The Evolution of Language Conferences.
  • Janse, E. (2002). Time-compressing natural and synthetic speech. In Proceedings of 7th International Conference on Spoken Language Processing (pp. 1645-1648).
  • Joo, H., Jang, J., Kim, S., Cho, T., & Cutler, A. (2019). Prosodic structural effects on coarticulatory vowel nasalization in Australian English in comparison to American English. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 835-839). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    This study investigates effects of prosodic factors (prominence, boundary) on coarticulatory Vnasalization in Australian English (AusE) in CVN and NVC in comparison to those in American English
    (AmE). As in AmE, prominence was found to
    lengthen N, but to reduce V-nasalization, enhancing N’s nasality and V’s orality, respectively (paradigmatic contrast enhancement). But the prominence effect in CVN was more robust than that in AmE. Again similar to findings in AmE, boundary
    induced a reduction of N-duration and V-nasalization phrase-initially (syntagmatic contrast enhancement), and increased the nasality of both C and V phrasefinally.
    But AusE showed some differences in terms
    of the magnitude of V nasalization and N duration. The results suggest that the linguistic contrast enhancements underlie prosodic-structure modulation of coarticulatory V-nasalization in
    comparable ways across dialects, while the fine phonetic detail indicates that the phonetics-prosody interplay is internalized in the individual dialect’s phonetic grammar.
  • Kearns, R. K., Norris, D., & Cutler, A. (2002). Syllable processing in English. In Proceedings of the 7th International Conference on Spoken Language Processing [ICSLP 2002] (pp. 1657-1660).

    Abstract

    We describe a reaction time study in which listeners detected word or nonword syllable targets (e.g. zoo, trel) in sequences consisting of the target plus a consonant or syllable residue (trelsh, trelshek). The pattern of responses differed from an earlier word-spotting study with the same material, in which words were always harder to find if only a consonant residue remained. The earlier results should thus not be viewed in terms of syllabic parsing, but in terms of a universal role for syllables in speech perception; words which are accidentally present in spoken input (e.g. sell in self) can be rejected when they leave a residue of the input which could not itself be a word.
  • Kempen, G., & Van Breugel, C. (2002). A workbench for visual-interactive grammar instruction at the secondary education level. In Proceedings of the 10th International CALL Conference (pp. 157-158). Antwerp: University of Antwerp.
  • Kempen, G., & Harbusch, K. (2002). Rethinking the architecture of human syntactic processing: The relationship between grammatical encoding and decoding. In Proceedings of the 35th Meeting of the Societas Linguistica Europaea. University of Potsdam.
  • Khoe, Y. H., Tsoukala, C., Kootstra, G. J., & Frank, S. L. (2020). Modeling cross-language structural priming in sentence production. In T. C. Stewart (Ed.), Proceedings of the 18th Annual Meeting of the International Conference on Cognitive Modeling (pp. 131-137). University Park, PA, USA: The Penn State Applied Cognitive Science Lab.

    Abstract

    A central question in the psycholinguistic study of multilingualism is how syntax is shared across languages. We implement a model to investigate whether error-based implicit learning can provide an account of cross-language structural priming. The model is based on the Dual-path model of
    sentence-production (Chang, 2002). We implement our model using the Bilingual version of Dual-path (Tsoukala, Frank, & Broersma, 2017). We answer two main questions: (1) Can structural priming of active and passive constructions occur between English and Spanish in a bilingual version of the Dual-
    path model? (2) Does cross-language priming differ quantitatively from within-language priming in this model? Our results show that cross-language priming does occur in the model. This finding adds to the viability of implicit learning as an account of structural priming in general and cross-language
    structural priming specifically. Furthermore, we find that the within-language priming effect is somewhat stronger than the cross-language effect. In the context of mixed results from
    behavioral studies, we interpret the latter finding as an indication that the difference between cross-language and within-
    language priming is small and difficult to detect statistically.
  • Klein, W. (Ed.). (2002). Sprache des Rechts II [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 128.
  • Klein, W., & Jungbluth, K. (Eds.). (2002). Deixis [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 125.
  • Klein, W. (Ed.). (1992). Textlinguistik [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (86).
  • Klein, W. (Ed.). (1987). Sprache und Ritual [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (65).
  • Kuijpers, C., Van Donselaar, W., & Cutler, A. (2002). Perceptual effects of assimilation-induced violation of final devoicing in Dutch. In J. H. L. Hansen, & B. Pellum (Eds.), The 7th International Conference on Spoken Language Processing (pp. 1661-1664). Denver: ICSA.

    Abstract

    Voice assimilation in Dutch is an optional phonological rule which changes the surface forms of words and in doing so may violate the otherwise obligatory phonological rule of syllablefinal devoicing. We report two experiments examining the influence of voice assimilation on phoneme processing, in lexical compound words and in noun-verb phrases. Processing was not impaired in appropriate assimilation contexts across morpheme boundaries, but was impaired when devoicing was violated (a) in an inappropriate non-assimilatory) context, or (b) across a syntactic boundary.
  • Kuntay, A., & Ozyurek, A. (2002). Joint attention and the development of the use of demonstrative pronouns in Turkish. In B. Skarabela, S. Fish, & A. H. Do (Eds.), Proceedings of the 26th annual Boston University Conference on Language Development (pp. 336-347). Somerville, MA: Cascadilla Press.
  • Lattenkamp, E. Z., Linnenschmidt, M., Mardus, E., Vernes, S. C., Wiegrebe, L., & Schutte, M. (2020). Impact of auditory feedback on bat vocal development. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 249-251). Nijmegen: The Evolution of Language Conferences.
  • Lei, L., Raviv, L., & Alday, P. M. (2020). Using spatial visualizations and real-world social networks to understand language evolution and change. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 252-254). Nijmegen: The Evolution of Language Conferences.
  • De León, L., & Levinson, S. C. (Eds.). (1992). Space in Mesoamerican languages [Special Issue]. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung, 45(6).
  • Levelt, W. J. M., & Schriefers, H. (1987). Stages of lexical access. In G. A. Kempen (Ed.), Natural language generation: new results in artificial intelligence, psychology and linguistics (pp. 395-404). Dordrecht: Nijhoff.
  • Levinson, S. C. (1987). Minimization and conversational inference. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 61-129). Benjamins.
  • Levshina, N. (2020). How tight is your language? A semantic typology based on Mutual Information. In K. Evang, L. Kallmeyer, R. Ehren, S. Petitjean, E. Seyffarth, & D. Seddah (Eds.), Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories (pp. 70-78). Düsseldorf, Germany: Association for Computational Linguistics. doi:10.18653/v1/2020.tlt-1.7.

    Abstract

    Languages differ in the degree of semantic flexibility of their syntactic roles. For example, Eng-
    lish and Indonesian are considered more flexible with regard to the semantics of subjects,
    whereas German and Japanese are less flexible. In Hawkins’ classification, more flexible lan-
    guages are said to have a loose fit, and less flexible ones are those that have a tight fit. This
    classification has been based on manual inspection of example sentences. The present paper
    proposes a new, quantitative approach to deriving the measures of looseness and tightness from
    corpora. We use corpora of online news from the Leipzig Corpora Collection in thirty typolog-
    ically and genealogically diverse languages and parse them syntactically with the help of the
    Universal Dependencies annotation software. Next, we compute Mutual Information scores for
    each language using the matrices of lexical lemmas and four syntactic dependencies (intransi-
    tive subjects, transitive subject, objects and obliques). The new approach allows us not only to
    reproduce the results of previous investigations, but also to extend the typology to new lan-
    guages. We also demonstrate that verb-final languages tend to have a tighter relationship be-
    tween lexemes and syntactic roles, which helps language users to recognize thematic roles early
    during comprehension.

    Additional information

    full text via ACL website
  • Liu, S., & Zhang, Y. (2019). Why some verbs are harder to learn than others – A micro-level analysis of everyday learning contexts for early verb learning. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2173-2178). Montreal, QB: Cognitive Science Society.

    Abstract

    Verb learning is important for young children. While most
    previous research has focused on linguistic and conceptual
    challenges in early verb learning (e.g. Gentner, 1982, 2006),
    the present paper examined early verb learning at the
    attentional level and quantified the input for early verb learning
    by measuring verb-action co-occurrence statistics in parent-
    child interaction from the learner’s perspective. To do so, we
    used head-mounted eye tracking to record fine-grained
    multimodal behaviors during parent-infant joint play, and
    analyzed parent speech, parent and infant action, and infant
    attention at the moments when parents produced verb labels.
    Our results show great variability across different action verbs,
    in terms of frequency of verb utterances, frequency of
    corresponding actions related to verb meanings, and infants’
    attention to verbs and actions, which provide new insights on
    why some verbs are harder to learn than others.
  • MacDonald, K., Räsänen, O., Casillas, M., & Warlaumont, A. S. (2020). Measuring prosodic predictability in children’s home language environments. In S. Denison, M. Mack, Y. Xu, & B. C. Armstrong (Eds.), Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society (CogSci 2020) (pp. 695-701). Montreal, QB: Cognitive Science Society.

    Abstract

    Children learn language from the speech in their home environment. Recent work shows that more infant-directed speech
    (IDS) leads to stronger lexical development. But what makes IDS a particularly useful learning signal? Here, we expand on an attention-based account first proposed by Räsänen et al. (2018): that prosodic modifications make IDS less predictable, and thus more interesting. First, we reproduce the critical finding from Räsänen et al.: that lab-recorded IDS pitch is less predictable compared to adult-directed speech (ADS). Next, we show that this result generalizes to the home language environment, finding that IDS in daylong recordings is also less predictable than ADS but that this pattern is much less robust than for IDS recorded in the lab. These results link experimental work on attention and prosodic modifications of IDS to real-world language-learning environments, highlighting some challenges of scaling up analyses of IDS to larger datasets that better capture children’s actual input.
  • Mai, F., Galke, L., & Scherp, A. (2019). CBOW is not all you need: Combining CBOW with the compositional matrix space model. In Proceedings of the Seventh International Conference on Learning Representations (ICLR 2019). OpenReview.net.

    Abstract

    Continuous Bag of Words (CBOW) is a powerful text embedding method. Due to its strong capabilities to encode word content, CBOW embeddings perform well on a wide range of downstream tasks while being efficient to compute. However, CBOW is not capable of capturing the word order. The reason is that the computation of CBOW's word embeddings is commutative, i.e., embeddings of XYZ and ZYX are the same. In order to address this shortcoming, we propose a
    learning algorithm for the Continuous Matrix Space Model, which we call Continual Multiplication of Words (CMOW). Our algorithm is an adaptation of word2vec, so that it can be trained on large quantities of unlabeled text. We empirically show that CMOW better captures linguistic properties, but it is inferior to CBOW in memorizing word content. Motivated by these findings, we propose a hybrid model that combines the strengths of CBOW and CMOW. Our results show that the hybrid CBOW-CMOW-model retains CBOW's strong ability to memorize word content while at the same time substantially improving its ability to encode other linguistic information by 8%. As a result, the hybrid also performs better on 8 out of 11 supervised downstream tasks with an average improvement of 1.2%.
  • Yu, J., Mailhammer, R., & Cutler, A. (2020). Vocabulary structure affects word recognition: Evidence from German listeners. In N. Minematsu, M. Kondo, T. Arai, & R. Hayashi (Eds.), Proceedings of Speech Prosody 2020 (pp. 474-478). Tokyo: ISCA. doi:10.21437/SpeechProsody.2020-97.

    Abstract

    Lexical stress is realised similarly in English, German, and
    Dutch. On a suprasegmental level, stressed syllables tend to be
    longer and more acoustically salient than unstressed syllables;
    segmentally, vowels in unstressed syllables are often reduced.
    The frequency of unreduced unstressed syllables (where only
    the suprasegmental cues indicate lack of stress) however,
    differs across the languages. The present studies test whether
    listener behaviour is affected by these vocabulary differences,
    by investigating German listeners’ use of suprasegmental cues
    to lexical stress in German and English word recognition. In a
    forced-choice identification task, German listeners correctly
    assigned single-syllable fragments (e.g., Kon-) to one of two
    words differing in stress (KONto, konZEPT). Thus, German
    listeners can exploit suprasegmental information for
    identifying words. German listeners also performed above
    chance in a similar task in English (with, e.g., DIver, diVERT),
    i.e., their sensitivity to these cues also transferred to a nonnative
    language. An English listener group, in contrast, failed
    in the English fragment task. These findings mirror vocabulary
    patterns: German has more words with unreduced unstressed
    syllables than English does.
  • Mamus, E., Rissman, L., Majid, A., & Ozyurek, A. (2019). Effects of blindfolding on verbal and gestural expression of path in auditory motion events. In A. K. Goel, C. M. Seifert, & C. C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2275-2281). Montreal, QB: Cognitive Science Society.

    Abstract

    Studies have claimed that blind people’s spatial representations are different from sighted people, and blind people display superior auditory processing. Due to the nature of auditory and haptic information, it has been proposed that blind people have spatial representations that are more sequential than sighted people. Even the temporary loss of sight—such as through blindfolding—can affect spatial representations, but not much research has been done on this topic. We compared blindfolded and sighted people’s linguistic spatial expressions and non-linguistic localization accuracy to test how blindfolding affects the representation of path in auditory motion events. We found that blindfolded people were as good as sighted people when localizing simple sounds, but they outperformed sighted people when localizing auditory motion events. Blindfolded people’s path related speech also included more sequential, and less holistic elements. Our results indicate that even temporary loss of sight influences spatial representations of auditory motion events
  • Marcoux, K., & Ernestus, M. (2019). Differences between native and non-native Lombard speech in terms of pitch range. In M. Ochmann, M. Vorländer, & J. Fels (Eds.), Proceedings of the ICA 2019 and EAA Euroregio. 23rd International Congress on Acoustics, integrating 4th EAA Euroregio 2019 (pp. 5713-5720). Berlin: Deutsche Gesellschaft für Akustik.

    Abstract

    Lombard speech, speech produced in noise, is acoustically different from speech produced in quiet (plain speech) in several ways, including having a higher and wider F0 range (pitch). Extensive research on native Lombard speech does not consider that non-natives experience a higher cognitive load while producing
    speech and that the native language may influence the non-native speech. We investigated pitch range in plain and Lombard speech in native and non-natives.
    Dutch and American-English speakers read contrastive question-answer pairs in quiet and in noise in English, while the Dutch also read Dutch sentence pairs. We found that Lombard speech is characterized by a wider pitch range than plain speech, for all speakers (native English, non-native English, and native Dutch).
    This shows that non-natives also widen their pitch range in Lombard speech. In sentences with early-focus, we see the same increase in pitch range when going from plain to Lombard speech in native and non-native English, but a smaller increase in native Dutch. In sentences with late-focus, we see the biggest increase for the native English, followed by non-native English and then native Dutch. Together these results indicate an effect of the native language on non-native Lombard speech.
  • Marcoux, K., & Ernestus, M. (2019). Pitch in native and non-native Lombard speech. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 2605-2609). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Lombard speech, speech produced in noise, is
    typically produced with a higher fundamental
    frequency (F0, pitch) compared to speech in quiet. This paper examined the potential differences in native and non-native Lombard speech by analyzing median pitch in sentences with early- or late-focus produced in quiet and noise. We found an increase in pitch in late-focus sentences in noise for Dutch speakers in both English and Dutch, and for American-English speakers in English. These results
    show that non-native speakers produce Lombard speech, despite their higher cognitive load. For the early-focus sentences, we found a difference between the Dutch and the American-English speakers. Whereas the Dutch showed an increased F0 in noise
    in English and Dutch, the American-English speakers did not in English. Together, these results suggest that some acoustic characteristics of Lombard speech, such as pitch, may be language-specific, potentially
    resulting in the native language influencing the non-native Lombard speech.
  • Matsuo, A., & Duffield, N. (2002). Assessing the generality of knowledge about English ellipsis in SLA. In J. Costa, & M. J. Freitas (Eds.), Proceedings of the GALA 2001 Conference on Language Acquisition (pp. 49-53). Lisboa: Associacao Portuguesa de Linguistica.
  • Matsuo, A., & Duffield, N. (2002). Finiteness and parallelism: Assessing the generality of knowledge about English ellipsis in SLA. In B. Skarabela, S. Fish, & A.-H.-J. Do (Eds.), Proceedings of the 26th Boston University Conference on Language Development (pp. 197-207). Somerville, Massachusetts: Cascadilla Press.
  • McQueen, J. M., & Cutler, A. (1992). Words within words: Lexical statistics and lexical access. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing: Vol. 1 (pp. 221-224). Alberta: University of Alberta.

    Abstract

    This paper presents lexical statistics on the pattern of occurrence of words embedded in other words. We report the results of an analysis of 25000 words, varying in length from two to six syllables, extracted from a phonetically-coded English dictionary (The Longman Dictionary of Contemporary English). Each syllable, and each string of syllables within each word was checked against the dictionary. Two analyses are presented: the first used a complete list of polysyllables, with look-up on the entire dictionary; the second used a sublist of content words, counting only embedded words which were themselves content words. The results have important implications for models of human speech recognition. The efficiency of these models depends, in different ways, on the number and location of words within words.
  • Mengede, J., Devanna, P., Hörpel, S. G., Firzla, U., & Vernes, S. C. (2020). Studying the genetic bases of vocal learning in bats. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 280-282). Nijmegen: The Evolution of Language Conferences.
  • Merkx, D., Frank, S., & Ernestus, M. (2019). Language learning using speech to image retrieval. In Proceedings of Interspeech 2019 (pp. 1841-1845). doi:10.21437/Interspeech.2019-3067.

    Abstract

    Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on existing neural network approaches to create visually grounded embeddings for spoken utterances. Using a combination of a multi-layer GRU, importance sampling, cyclic learning rates, ensembling and vectorial self-attention our results show a remarkable increase in image-caption retrieval performance over previous work. Furthermore, we investigate which layers in the model learn to recognise words in the input. We find that deeper network layers are better at encoding word presence, although the final layer has slightly lower performance. This shows that our visually grounded sentence encoder learns to recognise words from the input even though it is not explicitly trained for word recognition.
  • Moisik, S. R., Zhi Yun, D. P., & Dediu, D. (2019). Active adjustment of the cervical spine during pitch production compensates for shape: The ArtiVarK study. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 864-868). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    The anterior lordosis of the cervical spine is thought
    to contribute to pitch (fo) production by influencing
    cricoid rotation as a function of larynx height. This
    study examines the matter of inter-individual
    variation in cervical spine shape and whether this has
    an influence on how fo is produced along increasing
    or decreasing scales, using the ArtiVarK dataset,
    which contains real-time MRI pitch production data.
    We find that the cervical spine actively participates in
    fo production, but the amount of displacement
    depends on individual shape. In general, anterior
    spine motion (tending toward cervical lordosis)
    occurs for low fo, while posterior movement (tending
    towards cervical kyphosis) occurs for high fo.
  • Mudd, K., Lutzenberger, H., De Vos, C., Fikkert, P., Crasborn, O., & De Boer, B. (2020). How does social structure shape language variation? A case study of the Kata Kolok lexicon. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 302-304). Nijmegen: The Evolution of Language Conferences.
  • Nijveld, A., Ten Bosch, L., & Ernestus, M. (2019). ERP signal analysis with temporal resolution using a time window bank. In Proceedings of Interspeech 2019 (pp. 1208-1212). doi:10.21437/Interspeech.2019-2729.

    Abstract

    In order to study the cognitive processes underlying speech comprehension, neuro-physiological measures (e.g., EEG and MEG), or behavioural measures (e.g., reaction times and response accuracy) can be applied. Compared to behavioural measures, EEG signals can provide a more fine-grained and complementary view of the processes that take place during the unfolding of an auditory stimulus.

    EEG signals are often analysed after having chosen specific time windows, which are usually based on the temporal structure of ERP components expected to be sensitive to the experimental manipulation. However, as the timing of ERP components may vary between experiments, trials, and participants, such a-priori defined analysis time windows may significantly hamper the exploratory power of the analysis of components of interest. In this paper, we explore a wide-window analysis method applied to EEG signals collected in an auditory repetition priming experiment.

    This approach is based on a bank of temporal filters arranged along the time axis in combination with linear mixed effects modelling. Crucially, it permits a temporal decomposition of effects in a single comprehensive statistical model which captures the entire EEG trace.
  • Norris, D., Van Ooijen, B., & Cutler, A. (1992). Speeded detection of vowels and steady-state consonants. In J. Ohala, T. Neary, & B. Derwing (Eds.), Proceedings of the Second International Conference on Spoken Language Processing; Vol. 2 (pp. 1055-1058). Alberta: University of Alberta.

    Abstract

    We report two experiments in which vowels and steady-state consonants served as targets in a speeded detection task. In the first experiment, two vowels were compared with one voiced and once unvoiced fricative. Response times (RTs) to the vowels were longer than to the fricatives. The error rate was higher for the consonants. Consonants in word-final position produced the shortest RTs, For the vowels, RT correlated negatively with target duration. In the second experiment, the same two vowel targets were compared with two nasals. This time there was no significant difference in RTs, but the error rate was still significantly higher for the consonants. Error rate and length correlated negatively for the vowels only. We conclude that RT differences between phonemes are independent of vocalic or consonantal status. Instead, we argue that the process of phoneme detection reflects more finely grained differences in acoustic/articulatory structure within the phonemic repertoire.
  • Oostdijk, N., Goedertier, W., Van Eynde, F., Boves, L., Martens, J.-P., Moortgat, M., & Baayen, R. H. (2002). Experiences from the Spoken Dutch Corpus Project. In Third international conference on language resources and evaluation (pp. 340-347). Paris: European Language Resources Association.
  • Ozyurek, A. (2020). From hands to brains: How does human body talk, think and interact in face-to-face language use? In K. Truong, D. Heylen, & M. Czerwinski (Eds.), ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction (pp. 1-2). New York, NY, USA: Association for Computing Machinery. doi:10.1145/3382507.3419442.
  • Ozyurek, A. (2002). Speech-gesture relationship across languages and in second language learners: Implications for spatial thinking and speaking. In B. Skarabela, S. Fish, & A. H. Do (Eds.), Proceedings of the 26th annual Boston University Conference on Language Development (pp. 500-509). Somerville, MA: Cascadilla Press.
  • Paplu, S. H., Mishra, C., & Berns, K. (2020). Pseudo-randomization in automating robot behaviour during human-robot interaction. In 2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 1-6). Institute of Electrical and Electronics Engineers. doi:10.1109/ICDL-EpiRob48136.2020.9278115.

    Abstract

    Automating robot behavior in a specific situation is an active area of research. There are several approaches available in the literature of robotics to cater for the automatic behavior of a robot. However, when it comes to humanoids or human-robot interaction in general, the area has been less explored. In this paper, a pseudo-randomization approach has been introduced to automatize the gestures and facial expressions of an interactive humanoid robot called ROBIN based on its mental state. A significant number of gestures and facial expressions have been implemented to allow the robot more options to perform a relevant action or reaction based on visual stimuli. There is a display of noticeable differences in the behaviour of the robot for the same stimuli perceived from an interaction partner. This slight autonomous behavioural change in the robot clearly shows a notion of automation in behaviour. The results from experimental scenarios and human-centered evaluation of the system help validate the approach.

    Files private

    Request files
  • Parhammer*, S. I., Ebersberg*, M., Tippmann*, J., Stärk*, K., Opitz, A., Hinger, B., & Rossi, S. (2019). The influence of distraction on speech processing: How selective is selective attention? In Proceedings of Interspeech 2019 (pp. 3093-3097). doi:10.21437/Interspeech.2019-2699.

    Abstract

    -* indicates shared first authorship -
    The present study investigated the effects of selective attention on the processing of morphosyntactic errors in unattended parts of speech. Two groups of German native (L1) speakers participated in the present study. Participants listened to sentences in which irregular verbs were manipulated in three different conditions (correct, incorrect but attested ablaut pattern, incorrect and crosslinguistically unattested ablaut pattern). In order to track fast dynamic neural reactions to the stimuli, electroencephalography was used. After each sentence, participants in Experiment 1 performed a semantic judgement task, which deliberately distracted the participants from the syntactic manipulations and directed their attention to the semantic content of the sentence. In Experiment 2, participants carried out a syntactic judgement task, which put their attention on the critical stimuli. The use of two different attentional tasks allowed for investigating the impact of selective attention on speech processing and whether morphosyntactic processing steps are performed automatically. In Experiment 2, the incorrect attested condition elicited a larger N400 component compared to the correct condition, whereas in Experiment 1 no differences between conditions were found. These results suggest that the processing of morphosyntactic violations in irregular verbs is not entirely automatic but seems to be strongly affected by selective attention.
  • Petersson, K. M. (2002). Brain physiology. In R. Behn, & C. Veranda (Eds.), Proceedings of The 4th Southern European School of the European Physical Society - Physics in Medicine (pp. 37-38). Montreux: ESF.
  • Pouw, W., Paxton, A., Harrison, S. J., & Dixon, J. A. (2019). Acoustic specification of upper limb movement in voicing. In A. Grimminger (Ed.), Proceedings of the 6th Gesture and Speech in Interaction – GESPIN 6 (pp. 68-74). Paderborn: Universitaetsbibliothek Paderborn. doi:10.17619/UNIPB/1-812.
  • Pouw, W., & Dixon, J. A. (2019). Quantifying gesture-speech synchrony. In A. Grimminger (Ed.), Proceedings of the 6th Gesture and Speech in Interaction – GESPIN 6 (pp. 75-80). Paderborn: Universitaetsbibliothek Paderborn. doi:10.17619/UNIPB/1-812.

    Abstract

    Spontaneously occurring speech is often seamlessly accompanied by hand gestures. Detailed
    observations of video data suggest that speech and gesture are tightly synchronized in time,
    consistent with a dynamic interplay between body and mind. However, spontaneous gesturespeech
    synchrony has rarely been objectively quantified beyond analyses of video data, which
    do not allow for identification of kinematic properties of gestures. Consequently, the point in
    gesture which is held to couple with speech, the so-called moment of “maximum effort”, has
    been variably equated with the peak velocity, peak acceleration, peak deceleration, or the onset
    of the gesture. In the current exploratory report, we provide novel evidence from motiontracking
    and acoustic data that peak velocity is closely aligned, and shortly leads, the peak pitch
    (F0) of speech

    Additional information

    https://osf.io/9843h/
  • Rasenberg, M., Dingemanse, M., & Ozyurek, A. (2020). Lexical and gestural alignment in interaction and the emergence of novel shared symbols. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 356-358). Nijmegen: The Evolution of Language Conferences.
  • Raviv, L., Meyer, A. S., & Lev-Ari, S. (2020). Network structure and the cultural evolution of linguistic structure: A group communication experiment. In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 359-361). Nijmegen: The Evolution of Language Conferences.
  • de Reus, K., Carlson, D., Jadoul, Y., Lowry, A., Gross, S., Garcia, M., Salazar-Casals, A., Rubio-García, A., Haas, C. E., De Boer, B., & Ravignani, A. (2020). Relationships between vocal ontogeny and vocal tract anatomy in harbour seals (Phoca vitulina). In A. Ravignani, C. Barbieri, M. Flaherty, Y. Jadoul, E. Lattenkamp, H. Little, M. Martins, K. Mudd, & T. Verhoef (Eds.), The Evolution of Language: Proceedings of the 13th International Conference (Evolang13) (pp. 63-66). Nijmegen: The Evolution of Language Conferences.
  • Rissman, L., & Majid, A. (2019). Agency drives category structure in instrumental events. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2661-2667). Montreal, QB: Cognitive Science Society.

    Abstract

    Thematic roles such as Agent and Instrument have a long-standing place in theories of event representation. Nonetheless, the structure of these categories has been difficult to determine. We investigated how instrumental events, such as someone slicing bread with a knife, are categorized in English. Speakers described a variety of typical and atypical instrumental events, and we determined the similarity structure of their descriptions using correspondence analysis. We found that events where the instrument is an extension of an intentional agent were most likely to elicit similar language, highlighting the importance of agency in structuring instrumental categories.
  • Scharenborg, O., Boves, L., & de Veth, J. (2002). ASR in a human word recognition model: Generating phonemic input for Shortlist. In J. H. L. Hansen, & B. Pellom (Eds.), ICSLP 2002 - INTERSPEECH 2002 - 7th International Conference on Spoken Language Processing (pp. 633-636). ISCA Archive.

    Abstract

    The current version of the psycholinguistic model of human word recognition Shortlist suffers from two unrealistic constraints. First, the input of Shortlist must consist of a single string of phoneme symbols. Second, the current version of the search in Shortlist makes it difficult to deal with insertions and deletions in the input phoneme string. This research attempts to fully automatically derive a phoneme string from the acoustic signal that is as close as possible to the number of phonemes in the lexical representation of the word. We optimised an Automatic Phone Recogniser (APR) using two approaches, viz. varying the value of the mismatch parameter and optimising the APR output strings on the output of Shortlist. The approaches show that it will be very difficult to satisfy the input requirements of the present version of Shortlist with a phoneme string generated by an APR.
  • Scharenborg, O., & Boves, L. (2002). Pronunciation variation modelling in a model of human word recognition. In Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology [PMLA-2002] (pp. 65-70).

    Abstract

    Due to pronunciation variation, many insertions and deletions of phones occur in spontaneous speech. The psycholinguistic model of human speech recognition Shortlist is not well able to deal with phone insertions and deletions and is therefore not well suited for dealing with real-life input. The research presented in this paper explains how Shortlist can benefit from pronunciation variation modelling in dealing with real-life input. Pronunciation variation was modelled by including variants into the lexicon of Shortlist. A series of experiments was carried out to find the optimal acoustic model set for transcribing the training material that was used as basis for the generation of the variants. The Shortlist experiments clearly showed that Shortlist benefits from pronunciation variation modelling. However, the performance of Shortlist stays far behind the performance of other, more conventional speech recognisers.
  • Schiller, N. O., Schmitt, B., Peters, J., & Levelt, W. J. M. (2002). 'BAnana'or 'baNAna'? Metrical encoding during speech production [Abstract]. In M. Baumann, A. Keinath, & J. Krems (Eds.), Experimentelle Psychologie: Abstracts der 44. Tagung experimentell arbeitender Psychologen. (pp. 195). TU Chemnitz, Philosophische Fakultät.

    Abstract

    The time course of metrical encoding, i.e. stress, during speech production is investigated. In a first experiment, participants were presented with pictures whose bisyllabic Dutch names had initial or final stress (KAno 'canoe' vs. kaNON 'cannon'; capital letters indicate stressed syllables). Picture names were matched for frequency and object recognition latencies. When participants were asked to judge whether picture names had stress on the first or second syllable, they showed significantly faster decision times for initially stressed targets than for targets with final stress. Experiment 2 replicated this effect with trisyllabic picture names (faster RTs for penultimate stress than for ultimate stress). In our view, these results reflect the incremental phonological encoding process. Wheeldon and Levelt (1995) found that segmental encoding is a process running from the beginning to the end of words. Here, we present evidence that the metrical pattern of words, i.e. stress, is also encoded incrementally.

Share this page