Publications

Displaying 1 - 100 of 156
  • Adank, P., & McQueen, J. M. (2007). The effect of an unfamiliar regional accent on spoken-word comprehension. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 1925-1928). Dudweiler: Pirrot.

    Abstract

    This study aimed first to determine whether there is a delay associated with processing words in an unfamiliar regional accent compared to words in a familiar regional accent, and second to establish whether short-term exposure to an unfamiliar accent affects the speed and accuracy of comprehension of words spoken in that accent. Listeners performed an animacy decision task for words spoken in their own and in an unfamiliar accent. Next, they were exposed to approximately 20 minutes of speech in one of these two accents. After exposure, they repeated the animacy decision task. Results showed a considerable delay in word processing for the unfamiliar accent, but no effect of short-term exposure.
  • Adank, P., Smits, R., & Van Hout, R. (2003). Modeling perceived vowel height, advancement, and rounding. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 647-650). Adelaide: Causal Productions.
  • Amatuni, A., Schroer, S. E., Zhang, Y., Peters, R. E., Reza, M. A., Crandall, D., & Yu, C. (2021). In-the-moment visual information from the infant's egocentric view determines the success of infant word learning: A computational study. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 265-271). Vienna: Cognitive Science Society.

    Abstract

    Infants learn the meaning of words from accumulated experiences of real-time interactions with their caregivers. To study the effects of visual sensory input on word learning, we recorded infant's view of the world using head-mounted eye trackers during free-flowing play with a caregiver. While playing, infants were exposed to novel label-object mappings and later learning outcomes for these items were tested after the play session. In this study we use a classification based approach to link properties of infants' visual scenes during naturalistic labeling moments to their word learning outcomes. We find that a model which integrates both highly informative and ambiguous sensory evidence is a better fit to infants' individual learning outcomes than models where either type of evidence is taken alone, and that raw labeling frequency is unable to account for the word learning differences we observe. Here we demonstrate how a computational model, using only raw pixels taken from the egocentric scene image, can derive insights on human language learning.
  • Ameka, F. K., & Levinson, S. C. (Eds.). (2007). The typology and semantics of locative predication: Posturals, positionals and other beasts [Special Issue]. Linguistics, 45(5).

    Abstract

    This special issue is devoted to a relatively neglected topic in linguistics, namely the verbal component of locative statements. English tends, of course, to use a simple copula in utterances like “The cup is on the table”, but many languages, perhaps as many as half of the world's languages, have a set of alternate verbs, or alternate verbal affixes, which contrast in this slot. Often these are classificatory verbs of ‘sitting’, ‘standing’ and ‘lying’. For this reason, perhaps, Aristotle listed position among his basic (“noncomposite”) categories.
  • Anastasopoulos, A., Lekakou, M., Quer, J., Zimianiti, E., DeBenedetto, J., & Chiang, D. (2018). Part-of-speech tagging on an endangered language: a parallel Griko-Italian Resource. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018) (pp. 2529-2539).

    Abstract

    Most work on part-of-speech (POS) tagging is focused on high resource languages, or examines low-resource and active learning settings through simulated studies. We evaluate POS tagging techniques on an actual endangered language, Griko. We present a resource that contains 114 narratives in Griko, along with sentence-level translations in Italian, and provides gold annotations for the test set. Based on a previously collected small corpus, we investigate several traditional methods, as well as methods that take advantage of monolingual data or project cross-lingual POS tags. We show that the combination of a semi-supervised method with cross-lingual transfer is more appropriate for this extremely challenging setting, with the best tagger achieving an accuracy of 72.9%. With an applied active learning scheme, which we use to collect sentence-level annotations over the test set, we achieve improvements of more than 21 percentage points
  • Andics, A., McQueen, J. M., & Van Turennout, M. (2007). Phonetic content influences voice discriminability. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 1829-1832). Dudweiler: Pirrot.

    Abstract

    We present results from an experiment which shows that voice perception is influenced by the phonetic content of speech. Dutch listeners were presented with thirteen speakers pronouncing CVC words with systematically varying segmental content, and they had to discriminate the speakers’ voices. Results show that certain segments help listeners discriminate voices more than other segments do. Voice information can be extracted from every segmental position of a monosyllabic word and is processed rapidly. We also show that although relative discriminability within a closed set of voices appears to be a stable property of a voice, it is also influenced by segmental cues – that is, perceived uniqueness of a voice depends on what that voice says.
  • Bauer, B. L. M. (2003). The adverbial formation in mente in Vulgar and Late Latin: A problem in grammaticalization. In H. Solin, M. Leiwo, & H. Hallo-aho (Eds.), Latin vulgaire, latin tardif VI (pp. 439-457). Hildesheim: Olms.
  • Bentz, C., Dediu, D., Verkerk, A., & Jäger, G. (2018). Language family trees reflect geography and demography beyond neutral drift. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 38-40). Toruń, Poland: NCU Press. doi:10.12775/3991-1.006.
  • Bodur, K., Branje, S., Peirolo, M., Tiscareno, I., & German, J. S. (2021). Domain-initial strengthening in Turkish: Acoustic cues to prosodic hierarchy in stop consonants. In Proceedings of Interspeech 2021 (pp. 1459-1463). doi:10.21437/Interspeech.2021-2230.

    Abstract

    Studies have shown that cross-linguistically, consonants at the left edge of higher-level prosodic boundaries tend to be more forcefully articulated than those at lower-level boundaries, a phenomenon known as domain-initial strengthening. This study tests whether similar effects occur in Turkish, using the Autosegmental-Metrical model proposed by Ipek & Jun [1, 2] as the basis for assessing boundary strength. Productions of /t/ and /d/ were elicited in four domain-initial prosodic positions corresponding to progressively higher-level boundaries: syllable, word, intermediate phrase, and Intonational Phrase. A fifth position, nuclear word, was included in order to better situate it within the prosodic hierarchy. Acoustic correlates of articulatory strength were measured, including closure duration for /d/ and /t/, as well as voice onset time and burst energy for /t/. Our results show that closure duration increases cumulatively from syllable to intermediate phrase, while voice onset time and burst energy are not influenced by boundary strength. These findings provide corroborating evidence for Ipek & Jun’s model, particularly for the distinction between word and intermediate phrase boundaries. Additionally, articulatory strength at the left edge of the nuclear word patterned closely with word-initial position, supporting the view that the nuclear word is not associated with a distinct phrasing domain
  • Brand, J., Monaghan, P., & Walker, P. (2018). Changing Signs: Testing How Sound-Symbolism Supports Early Word Learning. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1398-1403). Austin, TX: Cognitive Science Society.

    Abstract

    Learning a language involves learning how to map specific forms onto their associated meanings. Such mappings can utilise arbitrariness and non-arbitrariness, yet, our understanding of how these two systems operate at different stages of vocabulary development is still not fully understood. The Sound-Symbolism Bootstrapping Hypothesis (SSBH) proposes that sound-symbolism is essential for word learning to commence, but empirical evidence of exactly how sound-symbolism influences language learning is still sparse. It may be the case that sound-symbolism supports acquisition of categories of meaning, or that it enables acquisition of individualized word meanings. In two Experiments where participants learned form-meaning mappings from either sound-symbolic or arbitrary languages, we demonstrate the changing roles of sound-symbolism and arbitrariness for different vocabulary sizes, showing that sound-symbolism provides an advantage for learning of broad categories, which may then transfer to support learning individual words, whereas an arbitrary language impedes acquisition of categories of sound to meaning.
  • Braun, B. (2007). Effects of dialect and context on the realisation of German prenuclear accents. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 961-964). Dudweiler: Pirrot.

    Abstract

    We investigated whether alignment differences reported for Southern and Northern German speakers (Southerners align peaks in prenuclear accents later than Northerners) are carried over to the production of different functional categories such as contrast. To this end, the realisation of non-contrastive theme accents is compared with those in contrastive theme-rheme pairs such as ‘Sam rented a truck and Johanna rented a car.’
    We found that when producing this ‘double-contrast’, speakers mark contrast both phonetically by delaying and rising the peak of the theme accent (‘Johanna’) and/or phonologically by a change in rheme accent type (from high to falling ‘car’).
    The effect of dialect is complex: a) only in non-contrastive contexts produced with a high rheme accent Southerners align peaks later than Northerners; b) peak delay as a means to signal functional contrast is not used uniformly by the two varieties. Dialect clearly affects the realisation of prenuclear accents but its effect is conditioned by the pragmatic and intonational context.
  • Broersma, M. (2007). Why the 'president' does not excite the 'press: The limits of spurious lexical activation in L2 listening. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetics Sciences (ICPhS 2007) (pp. 1909-1912). Dudweiler: Pirrot.

    Abstract

    Two Cross-Modal Priming experiments assessed
    lexical activation of unintended words for
    nonnative (Dutch) and English native listeners.
    Stimuli mismatched words in final voicing, which
    in earlier studies caused spurious lexical activation
    for Dutch listeners. The stimuli were embedded in
    or cut out of a carrier (PRESident). The presence of
    a longer lexical competitor in the signal or as a
    possible continuation of it prevented spurious
    lexical activation of mismatching words (press).
  • Broersma, M., & Van de Ven, M. (2007). More flexible use of perceptual cues in nonnative than in native listening: Preceding vowel duration as a cue for final /v/-/f/. In Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech (New Sounds 2007).

    Abstract

    Three 2AFC experiments investigated Dutch and English listeners’ use of preceding vowel duration for the English final /v/-/f/ contrast. Dutch listeners used vowel duration more flexibly than English listeners did: they could use vowel duration as accurately as native listeners, but were better at ignoring it when it was misleading.
  • Broersma, M. (2007). Kettle hinders cat, shadow does not hinder shed: Activation of 'almost embedded' words in nonnative listening. In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 1893-1896). Adelaide: Causal Productions.

    Abstract

    A Cross-Modal Priming experiment investigated Dutch
    listeners’ perception of English words. Target words were
    embedded in a carrier word (e.g., cat in catalogue) or ‘almost
    embedded’ in a carrier word except for a mismatch in the
    perceptually difficult /æ/-/ε/ contrast (e.g., cat in kettle).
    Previous results showed a bias towards perception of /ε/ over
    /æ/. The present study shows that presentation of carrier
    words either containing an /æ/ or an /ε/ led to long lasting
    inhibition of embedded or ‘almost embedded’ words with an
    /æ/, but not of words with an /ε/. Thus, both catalogue and
    kettle hindered recognition of cat, whereas neither schedule
    nor shadow hindered recognition of shed.
  • Butterfield, S., & Cutler, A. (1988). Segmentation errors by human listeners: Evidence for a prosodic segmentation strategy. In W. Ainsworth, & J. Holmes (Eds.), Proceedings of SPEECH ’88: Seventh Symposium of the Federation of Acoustic Societies of Europe: Vol. 3 (pp. 827-833). Edinburgh: Institute of Acoustics.
  • Byun, K.-S., De Vos, C., Roberts, S. G., & Levinson, S. C. (2018). Interactive sequences modulate the selection of expressive forms in cross-signing. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 67-69). Toruń, Poland: NCU Press. doi:10.12775/3991-1.012.
  • Cablitz, G., Ringersma, J., & Kemps-Snijders, M. (2007). Visualizing endangered indigenous languages of French Polynesia with LEXUS. In Proceedings of the 11th International Conference Information Visualization (IV07) (pp. 409-414). IEEE Computer Society.

    Abstract

    This paper reports on the first results of the DOBES project ‘Towards a multimedia dictionary of the Marquesan and Tuamotuan languages of French Polynesia’. Within the framework of this project we are building a digital multimedia encyclopedic lexicon of the endangered Marquesan and Tuamotuan languages using a new tool, LEXUS. LEXUS is a web-based lexicon tool, targeted at linguists involved in language documentation. LEXUS offers the possibility to visualize language. It provides functionalities to include audio, video and still images to the lexical entries of the dictionary, as well as relational linking for the creation of a semantic network knowledge base. Further activities aim at the development of (1) an improved user interface in close cooperation with the speech community and (2) a collaborative workspace functionality which will allow the speech community to actively participate in the creation of lexica.
  • Chen, A., & Fikkert, P. (2007). Intonation of early two-word utterances in Dutch. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 315-320). Dudweiler: Pirrot.

    Abstract

    We analysed intonation contours of two-word utterances from three monolingual Dutch children aged between 1;4 and 2;1 in the autosegmentalmetrical framework. Our data show that children have mastered the inventory of the boundary tones and nuclear pitch accent types (except for L*HL and L*!HL) at the 160-word level, and the set of nondownstepped pre-nuclear pitch accents (except for L*) at the 230-word level, contra previous claims on the mastery of adult-like intonation contours before or at the onset of first words. Further, there is evidence that intonational development is correlated with an increase in vocabulary size. Moreover, we found that children show a preference for falling contours, as predicted on the basis of universal production mechanisms. In addition, the utterances are mostly spoken with both words accented independent of semantic relations expressed and information status of each word across developmental stages, contra prior work. Our study suggests a number of topics for further research.
  • Chen, A. (2007). Intonational realisation of topic and focus by Dutch-acquiring 4- to 5-year-olds. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 1553-1556). Dudweiler: Pirott.

    Abstract

    This study examined how Dutch-acquiring 4- to 5-year-olds use different pitch accent types and deaccentuation to mark topic and focus at the sentence level and how they differ from adults. The topic and focus were non-contrastive and realised as full noun phrases. It was found that children realise topic and focus similarly frequently with H*L, whereas adults use H*L noticeably more frequently in focus than in topic in sentence-initial position and nearly only in focus in sentence-final position. Further, children frequently realise the topic with an accent, whereas adults mostly deaccent the sentence-final topic and use H*L and H* to realise the sentence-initial topic because of rhythmic motivation. These results show that 4- and 5-year-olds have not acquired H*L as the typical focus accent and deaccentuation as the typical topic intonation yet. Possibly, frequent use of H*L in sentence-initial topic in adult Dutch has made it difficult to extract the functions of H*L and deaccentuation from the input.
  • Chen, A. (2003). Language dependence in continuation intonation. In M. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS.) (pp. 1069-1072). Rundle Mall, SA, Austr.: Causal Productions Pty.
  • Chen, A. (2003). Reaction time as an indicator to discrete intonational contrasts in English. In Proceedings of Eurospeech 2003 (pp. 97-100).

    Abstract

    This paper reports a perceptual study using a semantically motivated identification task in which we investigated the nature of two pairs of intonational contrasts in English: (1) normal High accent vs. emphatic High accent; (2) early peak alignment vs. late peak alignment. Unlike previous inquiries, the present study employs an on-line method using the Reaction Time measurement, in addition to the measurement of response frequencies. Regarding the peak height continuum, the mean RTs are shortest for within-category identification but longest for across-category identification. As for the peak alignment contrast, no identification boundary emerges and the mean RTs only reflect a difference between peaks aligned with the vowel onset and peaks aligned elsewhere. We conclude that the peak height contrast is discrete but the previously claimed discreteness of the peak alignment contrast is not borne out.
  • Cho, T. (2003). Lexical stress, phrasal accent and prosodic boundaries in the realization of domain-initial stops in Dutch. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhs 2003) (pp. 2657-2660). Adelaide: Causal Productions.

    Abstract

    This study examines the effects of prosodic boundaries, lexical stress, and phrasal accent on the acoustic realization of stops (/t, d/) in Dutch, with special attention paid to language-specificity in the phonetics-prosody interface. The results obtained from various acoustic measures show systematic phonetic variations in the production of /t d/ as a function of prosodic position, which may be interpreted as being due to prosodicallyconditioned articulatory strengthening. Shorter VOTs were found for the voiceless stop /t/ in prosodically stronger locations (as opposed to longer VOTs in this position in English). The results suggest that prosodically-driven phonetic realization is bounded by a language-specific phonological feature system.
  • Coopmans, C. W., De Hoop, H., Kaushik, K., Hagoort, P., & Martin, A. E. (2021). Structure-(in)dependent interpretation of phrases in humans and LSTMs. In Proceedings of the Society for Computation in Linguistics (SCiL 2021) (pp. 459-463).

    Abstract

    In this study, we compared the performance of a long short-term memory (LSTM) neural network to the behavior of human participants on a language task that requires hierarchically structured knowledge. We show that humans interpret ambiguous noun phrases, such as second blue ball, in line with their hierarchical constituent structure. LSTMs, instead, only do
    so after unambiguous training, and they do not systematically generalize to novel items. Overall, the results of our simulations indicate that a model can behave hierarchically without relying on hierarchical constituent structure.
  • Cristia, A., Ganesh, S., Casillas, M., & Ganapathy, S. (2018). Talker diarization in the wild: The case of child-centered daylong audio-recordings. In Proceedings of Interspeech 2018 (pp. 2583-2587). doi:10.21437/Interspeech.2018-2078.

    Abstract

    Speaker diarization (answering 'who spoke when') is a widely researched subject within speech technology. Numerous experiments have been run on datasets built from broadcast news, meeting data, and call centers—the task sometimes appears close to being solved. Much less work has begun to tackle the hardest diarization task of all: spontaneous conversations in real-world settings. Such diarization would be particularly useful for studies of language acquisition, where researchers investigate the speech children produce and hear in their daily lives. In this paper, we study audio gathered with a recorder worn by small children as they went about their normal days. As a result, each child was exposed to different acoustic environments with a multitude of background noises and a varying number of adults and peers. The inconsistency of speech and noise within and across samples poses a challenging task for speaker diarization systems, which we tackled via retraining and data augmentation techniques. We further studied sources of structured variation across raw audio files, including the impact of speaker type distribution, proportion of speech from children, and child age on diarization performance. We discuss the extent to which these findings might generalize to other samples of speech in the wild.
  • Cutler, A., Murty, L., & Otake, T. (2003). Rhythmic similarity effects in non-native listening? In Proceedings of the 15th International Congress of Phonetic Sciences (PCPhS 2003) (pp. 329-332). Adelaide: Causal Productions.

    Abstract

    Listeners rely on native-language rhythm in segmenting speech; in different languages, stress-, syllable- or mora-based rhythm is exploited. This language-specificity affects listening to non- native speech, if native procedures are applied even though inefficient for the non-native language. However, speakers of two languages with similar rhythmic interpretation should segment their own and the other language similarly. This was observed to date only for related languages (English-Dutch; French-Spanish). We now report experiments in which Japanese listeners heard Telugu, a Dravidian language unrelated to Japanese, and Telugu listeners heard Japanese. In both cases detection of target sequences in speech was harder when target boundaries mismatched mora boundaries, exactly the pattern that Japanese listeners earlier exhibited with Japanese and other languages. These results suggest that Telugu and Japanese listeners use similar procedures in segmenting speech, and support the idea that languages fall into rhythmic classes, with aspects of phonological structure affecting listeners' speech segmentation.
  • Cutler, A., Aslin, R. N., Gervain, J., & Nespor, M. (Eds.). (2021). Special issue in honor of Jacques Mehler, Cognition's founding editor [Special Issue]. Cognition, 213.
  • Ip, M. H. K., & Cutler, A. (2018). Asymmetric efficiency of juncture perception in L1 and L2. In K. Klessa, J. Bachan, A. Wagner, M. Karpiński, & D. Śledziński (Eds.), Proceedings of Speech Prosody 2018 (pp. 289-296). Baixas, France: ISCA. doi:10.21437/SpeechProsody.2018-59.

    Abstract

    In two experiments, Mandarin listeners resolved potential syntactic ambiguities in spoken utterances in (a) their native language (L1) and (b) English which they had learned as a second language (L2). A new disambiguation task was used, requiring speeded responses to select the correct meaning for structurally ambiguous sentences. Importantly, the ambiguities used in the study are identical in Mandarin and in English, and production data show that prosodic disambiguation of this type of ambiguity is also realised very similarly in the two languages. The perceptual results here showed however that listeners’ response patterns differed for L1 and L2, although there was a significant increase in similarity between the two response patterns with increasing exposure to the L2. Thus identical ambiguity and comparable disambiguation patterns in L1 and L2 do not lead to immediate application of the appropriate L1 listening strategy to L2; instead, it appears that such a strategy may have to be learned anew for the L2.
  • Ip, M. H. K., & Cutler, A. (2018). Cue equivalence in prosodic entrainment for focus detection. In J. Epps, J. Wolfe, J. Smith, & C. Jones (Eds.), Proceedings of the 17th Australasian International Conference on Speech Science and Technology (pp. 153-156).

    Abstract

    Using a phoneme detection task, the present series of
    experiments examines whether listeners can entrain to
    different combinations of prosodic cues to predict where focus
    will fall in an utterance. The stimuli were recorded by four
    female native speakers of Australian English who happened to
    have used different prosodic cues to produce sentences with
    prosodic focus: a combination of duration cues, mean and
    maximum F0, F0 range, and longer pre-target interval before
    the focused word onset, only mean F0 cues, only pre-target
    interval, and only duration cues. Results revealed that listeners
    can entrain in almost every condition except for where
    duration was the only reliable cue. Our findings suggest that
    listeners are flexible in the cues they use for focus processing.
  • Cutler, A., Wales, R., Cooper, N., & Janssen, J. (2007). Dutch listeners' use of suprasegmental cues to English stress. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetics Sciences (ICPhS 2007) (pp. 1913-1916). Dudweiler: Pirrot.

    Abstract

    Dutch listeners outperform native listeners in identifying syllable stress in English. This is because lexical stress is more useful in recognition of spoken words of Dutch than of English, so that Dutch listeners pay greater attention to stress in general. We examined Dutch listeners’ use of the acoustic correlates of English stress. Primary- and secondary-stressed syllables differ significantly on acoustic measures, and some differences, in F0 especially, correlate with data of earlier listening experiments. The correlations found in the Dutch responses were not paralleled in data from native listeners. Thus the acoustic cues which distinguish English primary versus secondary stress are better exploited by Dutch than by native listeners.
  • Cutler, A., & Weber, A. (2007). Listening experience and phonetic-to-lexical mapping in L2. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 43-48). Dudweiler: Pirrot.

    Abstract

    In contrast to initial L1 vocabularies, which of necessity depend largely on heard exemplars, L2 vocabulary construction can draw on a variety of knowledge sources. This can lead to richer stored knowledge about the phonology of the L2 than the listener's prelexical phonetic processing capacity can support, and thus to mismatch between the level of detail required for accurate lexical mapping and the level of detail delivered by the prelexical processor. Experiments on spoken word recognition in L2 have shown that phonetic contrasts which are not reliably perceived are represented in the lexicon nonetheless. This lexical representation of contrast must be based on abstract knowledge, not on veridical representation of heard exemplars. New experiments confirm that provision of abstract knowledge (in the form of spelling) can induce lexical representation of a contrast which is not reliably perceived; but also that experience (in the form of frequency of occurrence) modulates the mismatch of phonetic and lexical processing. We conclude that a correct account of word recognition in L2 (as indeed in L1) requires consideration of both abstract and episodic information.
  • Cutler, A., Cooke, M., Garcia-Lecumberri, M. L., & Pasveer, D. (2007). L2 consonant identification in noise: Cross-language comparisons. In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 1585-1588). Adelaide: Causal productions.

    Abstract

    The difficulty of listening to speech in noise is exacerbated when the speech is in the listener’s L2 rather than L1. In this study, Spanish and Dutch users of English as an L2 identified American English consonants in a constant intervocalic context. Their performance was compared with that of L1 (British English) listeners, under quiet conditions and when the speech was masked by speech from another talker or by noise. Masking affected performance more for the Spanish listeners than for the L1 listeners, but not for the Dutch listeners, whose performance was worse than the L1 case to about the same degree in all conditions. There were, however,large differences in the pattern of results across individual consonants, which were consistent with differences in how consonants are identified in the respective L1s.
  • Cutler, A., Burchfield, L. A., & Antoniou, M. (2018). Factors affecting talker adaptation in a second language. In J. Epps, J. Wolfe, J. Smith, & C. Jones (Eds.), Proceedings of the 17th Australasian International Conference on Speech Science and Technology (pp. 33-36).

    Abstract

    Listeners adapt rapidly to previously unheard talkers by
    adjusting phoneme categories using lexical knowledge, in a
    process termed lexically-guided perceptual learning. Although
    this is firmly established for listening in the native language
    (L1), perceptual flexibility in second languages (L2) is as yet
    less well understood. We report two experiments examining L1
    and L2 perceptual learning, the first in Mandarin-English late
    bilinguals, the second in Australian learners of Mandarin. Both
    studies showed stronger learning in L1; in L2, however,
    learning appeared for the English-L1 group but not for the
    Mandarin-L1 group. Phonological mapping differences from
    the L1 to the L2 are suggested as the reason for this result.
  • Declerck, T., Cunningham, H., Saggion, H., Kuper, J., Reidsma, D., & Wittenburg, P. (2003). MUMIS - Advanced information extraction for multimedia indexing and searching digital media - Processing for multimedia interactive services. 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 553-556.
  • Delgado, T., Ravignani, A., Verhoef, T., Thompson, B., Grossi, T., & Kirby, S. (2018). Cultural transmission of melodic and rhythmic universals: Four experiments and a model. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 89-91). Toruń, Poland: NCU Press. doi:10.12775/3991-1.019.
  • Drude, S. (2003). Advanced glossing: A language documentation format and its implementation with Shoebox. In Proceedings of the 2002 International Conference on Language Resources and Evaluation (LREC 2002). Paris: ELRA.

    Abstract

    This paper presents Advanced Glossing, a proposal for a general glossing format designed for language documentation, and a specific setup for the Shoebox-program that implements Advanced Glossing to a large extent. Advanced Glossing (AG) goes beyond the traditional Interlinear Morphemic Translation, keeping syntactic and morphological information apart from each other in separate glossing tables. AG provides specific lines for different kinds of annotation – phonetic, phonological, orthographical, prosodic, categorial, structural, relational, and semantic, and it allows for gradual and successive, incomplete, and partial filling in case that some information may be irrelevant, unknown or uncertain. The implementation of AG in Shoebox sets up several databases. Each documented text is represented as a file of syntactic glossings. The morphological glossings are kept in a separate database. As an additional feature interaction with lexical databases is possible. The implementation makes use of the interlinearizing automatism provided by Shoebox, thus obtaining the table format for the alignment of lines in cells, and for semi-automatic filling-in of information in glossing tables which has been extracted from databases
  • Drude, S. (2003). Digitizing and annotating texts and field recordings in the Awetí project. In Proceedings of the EMELD Language Digitization Project Conference 2003. Workshop on Digitizing and Annotating Text and Field Recordings, LSA Institute, Michigan State University, July 11th -13th.

    Abstract

    Digitizing and annotating texts and field recordings Given that several initiatives worldwide currently explore the new field of documentation of endangered languages, the E-MELD project proposes to survey and unite procedures, techniques and results in order to achieve its main goal, ''the formulation and promulgation of best practice in linguistic markup of texts and lexicons''. In this context, this year's workshop deals with the processing of recorded texts. I assume the most valuable contribution I could make to the workshop is to show the procedures and methods used in the Awetí Language Documentation Project. The procedures applied in the Awetí Project are not necessarily representative of all the projects in the DOBES program, and they may very well fall short in several respects of being best practice, but I hope they might provide a good and concrete starting point for comparison, criticism and further discussion. The procedures to be exposed include: * taping with digital devices, * digitizing (preliminarily in the field, later definitely by the TIDEL-team at the Max Planck Institute in Nijmegen), * segmenting and transcribing, using the transcriber computer program, * translating (on paper, or while transcribing), * adding more specific annotation, using the Shoebox program, * converting the annotation to the ELAN-format developed by the TIDEL-team, and doing annotation with ELAN. Focus will be on the different types of annotation. Especially, I will present, justify and discuss Advanced Glossing, a text annotation format developed by H.-H. Lieb and myself designed for language documentation. It will be shown how Advanced Glossing can be applied using the Shoebox program. The Shoebox setup used in the Awetí Project will be shown in greater detail, including lexical databases and semi-automatic interaction between different database types (jumping, interlinearization). ( Freie Universität Berlin and Museu Paraense Emílio Goeldi, with funding from the Volkswagen Foundation.)
  • Duarte, R., Uhlmann, M., Van den Broek, D., Fitz, H., Petersson, K. M., & Morrison, A. (2018). Encoding symbolic sequences with spiking neural reservoirs. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/IJCNN.2018.8489114.

    Abstract

    Biologically inspired spiking networks are an important tool to study the nature of computation and cognition in neural systems. In this work, we investigate the representational capacity of spiking networks engaged in an identity mapping task. We compare two schemes for encoding symbolic input, one in which input is injected as a direct current and one where input is delivered as a spatio-temporal spike pattern. We test the ability of networks to discriminate their input as a function of the number of distinct input symbols. We also compare performance using either membrane potentials or filtered spike trains as state variable. Furthermore, we investigate how the circuit behavior depends on the balance between excitation and inhibition, and the degree of synchrony and regularity in its internal dynamics. Finally, we compare different linear methods of decoding population activity onto desired target labels. Overall, our results suggest that even this simple mapping task is strongly influenced by design choices on input encoding, state-variables, circuit characteristics and decoding methods, and these factors can interact in complex ways. This work highlights the importance of constraining computational network models of behavior by available neurobiological evidence.
  • Duffield, N., & Matsuo, A. (2003). Factoring out the parallelism effect in ellipsis: An interactional approach? In J. Chilar, A. Franklin, D. Keizer, & I. Kimbara (Eds.), Proceedings of the 39th Annual Meeting of the Chicago Linguistic Society (CLS) (pp. 591-603). Chicago: Chicago Linguistics Society.

    Abstract

    Traditionally, there have been three standard assumptions made about the Parallelism Effect on VP-ellipsis, namely that the effect is categorical, that it applies asymmetrically and that it is uniquely due to syntactic factors. Based on the results of a series of experiments involving online and offline tasks, it will be argued that the Parallelism Effect is instead noncategorical and interactional. The factors investigated include construction type, conceptual and morpho-syntactic recoverability, finiteness and anaphor type (to test VP-anaphora). The results show that parallelism is gradient rather than categorical, effects both VP-ellipsis and anaphora, and is influenced by both structural and non-structural factors.
  • Ergin, R., Senghas, A., Jackendoff, R., & Gleitman, L. (2018). Structural cues for symmetry, asymmetry, and non-symmetry in Central Taurus Sign Language. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 104-106). Toruń, Poland: NCU Press. doi:10.12775/3991-1.025.
  • Ernestus, M., & Baayen, R. H. (2007). The comprehension of acoustically reduced morphologically complex words: The roles of deletion, duration, and frequency of occurence. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhs 2007) (pp. 773-776). Dudweiler: Pirrot.

    Abstract

    This study addresses the roles of segment deletion, durational reduction, and frequency of use in the comprehension of morphologically complex words. We report two auditory lexical decision experiments with reduced and unreduced prefixed Dutch words. We found that segment deletions as such delayed comprehension. Simultaneously, however, longer durations of the different parts of the words appeared to increase lexical competition, either from the word’s stem (Experiment 1) or from the word’s morphological continuation forms (Experiment 2). Increased lexical competition slowed down especially the comprehension of low frequency words, which shows that speakers do not try to meet listeners’ needs when they reduce especially high frequency words.
  • Evans, N., Levinson, S. C., & Sterelny, K. (Eds.). (2021). Thematic issue on evolution of kinship systems [Special Issue]. Biological theory, 16.
  • Eviatar, Z., & Huettig, F. (Eds.). (2021). Literacy and writing systems [Special Issue]. Journal of Cultural Cognitive Science.
  • Falk, J. J., Zhang, Y., Scheutz, M., & Yu, C. (2021). Parents adaptively use anaphora during parent-child social interaction. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 1472-1478). Vienna: Cognitive Science Society.

    Abstract

    Anaphora, a ubiquitous feature of natural language, poses a particular challenge to young children as they first learn language due to its referential ambiguity. In spite of this, parents and caregivers use anaphora frequently in child-directed speech, potentially presenting a risk to effective communication if children do not yet have the linguistic capabilities of resolving anaphora successfully. Through an eye-tracking study in a naturalistic free-play context, we examine the strategies that parents employ to calibrate their use of anaphora to their child's linguistic development level. We show that, in this way, parents are able to intuitively scaffold the complexity of their speech such that greater referential ambiguity does not hurt overall communication success.
  • Galke, L., Franke, B., Zielke, T., & Scherp, A. (2021). Lifelong learning of graph neural networks for open-world node classification. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN). Piscataway, NJ: IEEE. doi:10.1109/IJCNN52387.2021.9533412.

    Abstract

    Graph neural networks (GNNs) have emerged as the standard method for numerous tasks on graph-structured data such as node classification. However, real-world graphs are often evolving over time and even new classes may arise. We model these challenges as an instance of lifelong learning, in which a learner faces a sequence of tasks and may take over knowledge acquired in past tasks. Such knowledge may be stored explicitly as historic data or implicitly within model parameters. In this work, we systematically analyze the influence of implicit and explicit knowledge. Therefore, we present an incremental training method for lifelong learning on graphs and introduce a new measure based on k-neighborhood time differences to address variances in the historic data. We apply our training method to five representative GNN architectures and evaluate them on three new lifelong node classification datasets. Our results show that no more than 50% of the GNN's receptive field is necessary to retain at least 95% accuracy compared to training over the complete history of the graph data. Furthermore, our experiments confirm that implicit knowledge becomes more important when fewer explicit knowledge is available.
  • Galke, L., Gerstenkorn, G., & Scherp, A. (2018). A case study of closed-domain response suggestion with limited training data. In M. Elloumi, M. Granitzer, A. Hameurlain, C. Seifert, B. Stein, A. Min Tjoa, & R. Wagner (Eds.), Database and Expert Systems Applications: DEXA 2018 International Workshops, BDMICS, BIOKDD, and TIR, Regensburg, Germany, September 3–6, 2018, Proceedings (pp. 218-229). Cham, Switzerland: Springer.

    Abstract

    We analyze the problem of response suggestion in a closed domain along a real-world scenario of a digital library. We present a text-processing pipeline to generate question-answer pairs from chat transcripts. On this limited amount of training data, we compare retrieval-based, conditioned-generation, and dedicated representation learning approaches for response suggestion. Our results show that retrieval-based methods that strive to find similar, known contexts are preferable over parametric approaches from the conditioned-generation family, when the training data is limited. We, however, identify a specific representation learning approach that is competitive to the retrieval-based approaches despite the training data limitation.
  • Galke, L., Seidlmayer, E., Lüdemann, G., Langnickel, L., Melnychuk, T., Förstner, K. U., Tochtermann, K., & Schultz, C. (2021). COVID-19++: A citation-aware Covid-19 dataset for the analysis of research dynamics. In Y. Chen, H. Ludwig, Y. Tu, U. Fayyad, X. Zhu, X. Hu, S. Byna, X. Liu, J. Zhang, S. Pan, V. Papalexakis, J. Wang, A. Cuzzocrea, & C. Ordonez (Eds.), Proceedings of the 2021 IEEE International Conference on Big Data (pp. 4350-4355). Piscataway, NJ: IEEE.

    Abstract

    COVID-19 research datasets are crucial for analyzing research dynamics. Most collections of COVID-19 research items do not to include cited works and do not have annotations
    from a controlled vocabulary. Starting with ZB MED KE data on COVID-19, which comprises CORD-19, we assemble a new dataset that includes cited work and MeSH annotations for all records. Furthermore, we conduct experiments on the analysis of research dynamics, in which we investigate predicting links in a co-annotation graph created on the basis of the new dataset. Surprisingly, we find that simple heuristic methods are better at
    predicting future links than more sophisticated approaches such as graph neural networks.
  • Galke, L., Mai, F., & Vagliano, I. (2018). Multi-modal adversarial autoencoders for recommendations of citations and subject labels. In T. Mitrovic, J. Zhang, L. Chen, & D. Chin (Eds.), UMAP '18: Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization (pp. 197-205). New York: ACM. doi:10.1145/3209219.3209236.

    Abstract

    We present multi-modal adversarial autoencoders for recommendation and evaluate them on two different tasks: citation recommendation and subject label recommendation. We analyze the effects of adversarial regularization, sparsity, and different input modalities. By conducting 408 experiments, we show that adversarial regularization consistently improves the performance of autoencoders for recommendation. We demonstrate, however, that the two tasks differ in the semantics of item co-occurrence in the sense that item co-occurrence resembles relatedness in case of citations, yet implies diversity in case of subject labels. Our results reveal that supplying the partial item set as input is only helpful, when item co-occurrence resembles relatedness. When facing a new recommendation task it is therefore crucial to consider the semantics of item co-occurrence for the choice of an appropriate model.
  • Goudbeek, M., Swingley, D., & Kluender, K. R. (2007). The limits of multidimensional category learning. In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 2325-2328). Adelaide: Causal Productions.

    Abstract

    Distributional learning is almost certainly involved in the human acquisition of phonetic categories. Because speech is inherently a multidimensional signal, learning phonetic categories entails multidimensional learning. Yet previous studies of auditory category learning have shown poor maintenance of learned multidimensional categories. Two experiments explored ways to improve maintenance: by increasing the costs associated with applying a unidimensional strategy; by providing additional information about the category structures; and by giving explicit instructions on how to categorize. Only with explicit instructions were categorization strategies maintained in a maintenance phase without supervision or distributional information.
  • Greenfield, M. D., Honing, H., Kotz, S. A., & Ravignani, A. (Eds.). (2021). Synchrony and rhythm interaction: From the brain to behavioural ecology [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 376.
  • Gürcanli, Ö., Nakipoglu Demiralp, M., & Ozyurek, A. (2007). Shared information and argument omission in Turkish. In H. Caunt-Nulton, S. Kulatilake, & I. Woo (Eds.), Proceedings of the 31st Annual Boston University Conference on Language Developement (pp. 267-273). Somerville, Mass: Cascadilla Press.
  • Harbusch, K., & Kempen, G. (2007). Clausal coordinate ellipsis in German: The TIGER treebank as a source of evidence. In J. Nivre, H. J. Kaalep, M. Kadri, & M. Koit (Eds.), Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007) (pp. 81-88). Tartu: University of Tartu.

    Abstract

    Syntactic parsers and generators need highquality grammars of coordination and coordinate ellipsis—structures that occur very frequently but are much less well understood theoretically than many other domains of grammar. Modern grammars of coordinate ellipsis are based nearly exclusively on linguistic judgments (intuitions). The extent to which grammar rules based on this type of empirical evidence generate all and only the structures in text corpora, is unknown. As part of a project on the development of a grammar and a generator for coordinate ellipsis in German, we undertook an extensive exploration of the TIGER treebank—a syntactically annotated corpus of about 50,000 newspaper sentences. We report (1) frequency data for the various patterns of coordinate ellipsis, and (2) several rarely (but regularly) occurring ‘fringe deviations’ from the intuition-based rules for several ellipsis types. This information can help improve parser and generator performance.
  • Harbusch, K., Breugel, C., Koch, U., & Kempen, G. (2007). Interactive sentence combining and paraphrasing in support of integrated writing and grammar instruction: A new application area for natural language sentence generators. In S. Busemann (Ed.), Proceedings of the 11th Euopean Workshop in Natural Language Generation (ENLG07) (pp. 65-68). ACL Anthology.

    Abstract

    The potential of sentence generators as engines in Intelligent Computer-Assisted Language Learning and teaching (ICALL) software has hardly been explored. We sketch the prototype of COMPASS, a system that supports integrated writing and grammar curricula for 10 to 14 year old elementary or secondary schoolers. The system enables first- or second-language teachers to design controlled writing exercises, in particular of the “sentence combining” variety. The system includes facilities for error diagnosis and on-line feedback. Syntactic structures built by students or system can be displayed as easily understood phrase-structure or dependency trees, adapted to the student’s level of grammatical knowledge. The heart of the system is a specially designed generator capable of lexically guided sentence generation, of generating syntactic paraphrases, and displaying syntactic structures visually.
  • Harmon, Z., Barak, L., Shafto, P., Edwards, J., & Feldman, N. H. (2021). Making heads or tails of it: A competition–compensation account of morphological deficits in language impairment. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 1872-1878). Vienna: Cognitive Science Society.

    Abstract

    Children with developmental language disorder (DLD) regularly use the base form of verbs (e.g., dance) instead of inflected forms (e.g., danced). We propose an account of this behavior in which children with DLD have difficulty processing novel inflected verbs in their input. This leads the inflected form to face stronger competition from alternatives. Competition is resolved by the production of a more accessible alternative with high semantic overlap with the inflected form: in English, the bare form. We test our account computationally by training a nonparametric Bayesian model that infers the productivity of the inflectional suffix (-ed). We systematically vary the number of novel types of inflected verbs in the input to simulate the input as processed by children with and without DLD. Modeling results are consistent with our hypothesis, suggesting that children’s inconsistent use of inflectional morphemes could stem from inferences they make on the basis of impoverished data.
  • Herbst, L. E. (2007). German 5-year-olds' intonational marking of information status. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 1557-1560). Dudweiler: Pirrot.

    Abstract

    This paper reports on findings from an elicited production task with German 5-year-old children, investigating their use of intonation to mark information status of discourse referents. In line with findings for adults, new referents were preferably marked by H* and L+H*; textually given referents were mainly deaccented. Accessible referents (whose first mentions were less recent) were mostly accented, and predominantly also realised with H* and L+H*, showing children’s sensitivity to recency of mention. No evidence for the consistent use of a special ‘accessibility accent’ H+L* (as has been proposed for adult German) was found.
  • Hintz, F., Voeten, C. C., McQueen, J. M., & Scharenborg, O. (2021). The effects of onset and offset masking on the time course of non-native spoken-word recognition in noise. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 133-139). Vienna: Cognitive Science Society.

    Abstract

    Using the visual-word paradigm, the present study investigated the effects of word onset and offset masking on the time course of non-native spoken-word recognition in the presence of background noise. In two experiments, Dutch non-native listeners heard English target words, preceded by carrier sentences that were noise-free (Experiment 1) or contained intermittent noise (Experiment 2). Target words were either onset- or offset-masked or not masked at all. Results showed that onset masking delayed target word recognition more than offset masking did, suggesting that – similar to natives – non-native listeners strongly rely on word onset information during word recognition in noise.

    Additional information

    Link to Preprint on BioRxiv
  • Holler, J., & Geoffrey, B. (2007). Gesture use in social interaction: how speakers' gestures can reflect listeners' thinking. In L. Mondada (Ed.), On-line Proceedings of the 2nd Conference of the International Society of Gesture Studies, Lyon, France 15-18 June 2005.
  • Hopman, E., Thompson, B., Austerweil, J., & Lupyan, G. (2018). Predictors of L2 word learning accuracy: A big data investigation. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 513-518). Austin, TX: Cognitive Science Society.

    Abstract

    What makes some words harder to learn than others in a second language? Although some robust factors have been identified based on small scale experimental studies, many relevant factors are difficult to study in such experiments due to the amount of data necessary to test them. Here, we investigate what factors affect the ease of learning of a word in a second language using a large data set of users learning English as a second language through the Duolingo mobile app. In a regression analysis, we test and confirm the well-studied effect of cognate status on word learning accuracy. Furthermore, we find significant effects for both cross-linguistic semantic alignment and English semantic density, two novel predictors derived from large scale distributional models of lexical semantics. Finally, we provide data on several other psycholinguistically plausible word level predictors. We conclude with a discussion of the limits, benefits and future research potential of using big data for investigating second language learning.
  • Huettig, F., Kolinsky, R., & Lachmann, T. (Eds.). (2018). The effects of literacy on cognition and brain functioning [Special Issue]. Language, Cognition and Neuroscience, 33(3).
  • Isaac, A., Zinn, C., Matthezing, H., Van de Meij, H., Schlobach, S., & Wang, S. (2007). The value of usage scenarios for thesaurus alignment in cultural heritage context. In Proceedings of the ISWC 2007 workshop in cultural heritage on the semantic web.

    Abstract

    Thesaurus alignment is important for efficient access to heterogeneous Cultural Heritage data. Current ontology alignment techniques provide solutions, but with limited value in practice, because the requirements from usage scenarios are rarely taken in account. In this paper, we start from particular requirements for book re-indexing and investigate possible ways of developing, deploying and evaluating thesaurus alignment techniques in this context. We then compare different aspects of this scenario with others from a more general perspective.
  • Isbilen, E., Frost, R. L. A., Monaghan, P., & Christiansen, M. (2018). Bridging artificial and natural language learning: Comparing processing- and reflection-based measures of learning. In C. Kalish, M. Rau, J. Zhu, & T. T. Rogers (Eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society (CogSci 2018) (pp. 1856-1861). Austin, TX: Cognitive Science Society.

    Abstract

    A common assumption in the cognitive sciences is that artificial and natural language learning rely on shared mechanisms. However, attempts to bridge the two have yielded ambiguous results. We suggest that an empirical disconnect between the computations employed during learning and the methods employed at test may explain these mixed results. Further, we propose statistically-based chunking as a potential computational link between artificial and natural language learning. We compare the acquisition of non-adjacent dependencies to that of natural language structure using two types of tasks: reflection-based 2AFC measures, and processing-based recall measures, the latter being more computationally analogous to the processes used during language acquisition. Our results demonstrate that task-type significantly influences the correlations observed between artificial and natural language acquisition, with reflection-based and processing-based measures correlating within – but not across – task-type. These findings have fundamental implications for artificial-to-natural language comparisons, both methodologically and theoretically.
  • Janse, E., Van der Werff, M., & Quené, H. (2007). Listening to fast speech: Aging and sentence context. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 681-684). Dudweiler: Pirrot.

    Abstract

    In this study we investigated to what extent a meaningful sentence context facilitates spoken word processing in young and older listeners if listening is made taxing by time-compressing the speech. Even though elderly listeners have been shown to benefit more from sentence context in difficult listening conditions than young listeners, time compression of speech may interfere with semantic comprehension, particularly in older listeners because of cognitive slowing. The results of a target detection experiment showed that, unlike young listeners who showed facilitation by context at both rates, elderly listeners showed context facilitation at the intermediate, but not at the fastest rate. This suggests that semantic interpretation lags behind target identification.
  • Janse, E. (2003). Word perception in natural-fast and artificially time-compressed speech. In M. SolÉ, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of the Phonetic Sciences (pp. 3001-3004).
  • Janssen, R., Moisik, S. R., & Dediu, D. (2018). Agent model reveals the influence of vocal tract anatomy on speech during ontogeny and glossogeny. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 171-174). Toruń, Poland: NCU Press. doi:10.12775/3991-1.042.
  • Jesse, A., & McQueen, J. M. (2007). Prelexical adjustments to speaker idiosyncracies: Are they position-specific? In H. van Hamme, & R. van Son (Eds.), Proceedings of Interspeech 2007 (pp. 1597-1600). Adelaide: Causal Productions.

    Abstract

    Listeners use lexical knowledge to adjust their prelexical representations of speech sounds in response to the idiosyncratic pronunciations of particular speakers. We used an exposure-test paradigm to investigate whether this type of perceptual learning transfers across syllabic positions. No significant learning effect was found in Experiment 1, where exposure sounds were onsets and test sounds were codas. Experiments 2-4 showed that there was no learning even when both exposure and test sounds were onsets. But a trend was found when exposure sounds were codas and test sounds were onsets (Experiment 5). This trend was smaller than the robust effect previously found for the coda-to-coda case. These findings suggest that knowledge about idiosyncratic pronunciations may be position specific: Knowledge about how a speaker produces sounds in one position, if it can be acquired at all, influences perception of sounds in that position more strongly than of sounds in another position.
  • Jesse, A., McQueen, J. M., & Page, M. (2007). The locus of talker-specific effects in spoken-word recognition. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 1921-1924). Dudweiler: Pirrot.

    Abstract

    Words repeated in the same voice are better recognized than when they are repeated in a different voice. Such findings have been taken as evidence for the storage of talker-specific lexical episodes. But results on perceptual learning suggest that talker-specific adjustments concern sublexical representations. This study thus investigates whether voice-specific repetition effects in auditory lexical decision are lexical or sublexical. The same critical set of items in Block 2 were, depending on materials in Block 1, either same-voice or different-voice word repetitions, new words comprising re-orderings of phonemes used in the same voice in Block 1, or new words with previously unused phonemes. Results show a benefit for words repeated by the same talker, and a smaller benefit for words consisting of phonemes repeated by the same talker. Talker-specific information thus appears to influence word recognition at multiple representational levels.
  • Jesse, A., & McQueen, J. M. (2007). Visual lexical stress information in audiovisual spoken-word recognition. In J. Vroomen, M. Swerts, & E. Krahmer (Eds.), Proceedings of the International Conference on Auditory-Visual Speech Processing 2007 (pp. 162-166). Tilburg: University of Tilburg.

    Abstract

    Listeners use suprasegmental auditory lexical stress information to resolve the competition words engage in during spoken-word recognition. The present study investigated whether (a) visual speech provides lexical stress information, and, more importantly, (b) whether this visual lexical stress information is used to resolve lexical competition. Dutch word pairs that differ in the lexical stress realization of their first two syllables, but not segmentally (e.g., 'OCtopus' and 'okTOber'; capitals marking primary stress) served as auditory-only, visual-only, and audiovisual speech primes. These primes either matched (e.g., 'OCto-'), mismatched (e.g., 'okTO-'), or were unrelated to (e.g., 'maCHI-') a subsequent printed target (octopus), which participants had to make a lexical decision to. To the degree that visual speech contains lexical stress information, lexical decisions to printed targets should be modulated through the addition of visual speech. Results show, however, no evidence for a role of visual lexical stress information in audiovisual spoken-word recognition.
  • Johnson, E. K. (2003). Speaker intent influences infants' segmentation of potentially ambiguous utterances. In Proceedings of the 15th International Congress of Phonetic Sciences (PCPhS 2003) (pp. 1995-1998). Adelaide: Causal Productions.
  • Kanero, J., Franko, I., Oranç, C., Uluşahin, O., Koskulu, S., Adigüzel, Z., Küntay, A. C., & Göksun, T. (2018). Who can benefit from robots? Effects of individual differences in robot-assisted language learning. In Proceedings of the 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 212-217). Piscataway, NJ, USA: IEEE.

    Abstract

    It has been suggested that some individuals may benefit more from social robots than do others. Using second
    language (L2) as an example, the present study examined how individual differences in attitudes toward robots and personality
    traits may be related to learning outcomes. Preliminary results with 24 Turkish-speaking adults suggest that negative attitudes
    toward robots, more specifically thoughts and anxiety about the negative social impact that robots may have on the society,
    predicted how well adults learned L2 words from a social robot. The possible implications of the findings as well as future directions are also discussed
  • Karadöller, D. Z., Sumer, B., Ünal, E., & Ozyurek, A. (2021). Spatial language use predicts spatial memory of children: Evidence from sign, speech, and speech-plus-gesture. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 672-678). Vienna: Cognitive Science Society.

    Abstract

    There is a strong relation between children’s exposure to
    spatial terms and their later memory accuracy. In the current
    study, we tested whether the production of spatial terms by
    children themselves predicts memory accuracy and whether
    and how language modality of these encodings modulates
    memory accuracy differently. Hearing child speakers of
    Turkish and deaf child signers of Turkish Sign Language
    described pictures of objects in various spatial relations to each
    other and later tested for their memory accuracy of these
    pictures in a surprise memory task. We found that having
    described the spatial relation between the objects predicted
    better memory accuracy. However, the modality of these
    descriptions in sign, speech, or speech-plus-gesture did not
    reveal differences in memory accuracy. We discuss the
    implications of these findings for the relation between spatial
    language, memory, and the modality of encoding.
  • Kelly, S. D., & Ozyurek, A. (Eds.). (2007). Gesture, language, and brain [Special Issue]. Brain and Language, 101(3).
  • Kempen, G., & Harbusch, K. (2003). A corpus study into word order variation in German subordinate clauses: Animacy affects linearization independently of function assignment. In Proceedings of AMLaP 2003 (pp. 153-154). Glasgow: Glasgow University.
  • Kempen, G. (1988). De netwerker: Spin in het web of rat in een doolhof? In SURF in theorie en praktijk: Van personal tot supercomputer (pp. 59-61). Amsterdam: Elsevier Science Publishers.
  • Khemlani, S., Leslie, S.-J., Glucksberg, S., & Rubio-Fernández, P. (2007). Do ducks lay eggs? How people interpret generic assertions. In D. S. McNamara, & J. G. Trafton (Eds.), Proceedings of the 29th Annual Conference of the Cognitive Science Society (CogSci 2007). Austin, TX: Cognitive Science Society.
  • Klein, W., & Von Stutterheim, C. (Eds.). (2007). Sprachliche Perspektivierung [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 145.
  • Klein, W., & Franceschini, R. (Eds.). (2003). Einfache Sprache [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 131.
  • Klein, W. (Ed.). (1988). Sprache Kranker [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (69).
  • Klein, W. (Ed.). (1985). Schriftlichkeit [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (59).
  • Koutamanis, E., Kootstra, G. J., Dijkstra, T., & Unsworth., S. (2021). Lexical priming as evidence for language-nonselective access in the simultaneous bilingual child's lexicon. In D. Dionne, & L.-A. Vidal Covas (Eds.), BUCLD 45: Proceedings of the 45th annual Boston University Conference on Language Development (pp. 413-430). Sommerville, MA: Cascadilla Press.
  • Kuzla, C., & Ernestus, M. (2007). Prosodic conditioning of phonetic detail of German plosives. In J. Trouvain, & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS 2007) (pp. 461-464). Dudweiler: Pirrot.

    Abstract

    The present study investigates the influence of prosodic structure on the fine-grained phonetic details of German plosives which also cue the phonological fortis-lenis contrast. Closure durations were found to be longer at higher prosodic boundaries. There was also less glottal vibration in lenis plosives at higher prosodic boundaries. Voice onset time in lenis plosives was not affected by prosody. In contrast, for the fortis plosives VOT decreased at higher boundaries, as did the maximal intensity of the release. These results demonstrate that the effects of prosody on different phonetic cues can go into opposite directions, but are overall constrained by the need to maintain phonological contrasts. While prosodic effects on some cues are compatible with a ‘fortition’ account of prosodic strengthening or with a general feature enhancement explanation, the effects on others enhance paradigmatic contrasts only within a given prosodic position.
  • Kuzla, C. (2003). Prosodically-conditioned variation in the realization of domain-final stops and voicing assimilation of domain-initial fricatives in German. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2829-2832). Adelaide: Causal Productions.
  • Lai, V. T., Chang, M., Duffield, C., Hwang, J., Xue, N., & Palmer, M. (2007). Defining a methodology for mapping Chinese and English sense inventories. In Proceedings of the 8th Chinese Lexical Semantics Workshop 2007 (CLSW 2007). The Hong Kong Polytechnic University, Hong Kong, May 21-23 (pp. 59-65).

    Abstract

    In this study, we explored methods for linking Chinese and English sense inventories using two opposing approaches: creating links (1) bottom-up: by starting at the finer-grained sense level then proceeding to the verb subcategorization frames and (2) top-down: by starting directly with the more coarse-grained frame levels. The sense inventories for linking include pre-existing corpora, such as English Propbank (Palmer, Gildea, and Kingsbury, 2005), Chinese Propbank (Xue and Palmer, 2004) and English WordNet (Fellbaum, 1998) and newly created corpora, the English and Chinese Sense Inventories from DARPA-GALE OntoNotes. In the linking task, we selected a group of highly frequent and polysemous communication verbs, including say, ask, talk, and speak in English, and shuo, biao-shi, jiang, and wen in Chinese. We found that with the bottom-up method, although speakers of both languages agreed on the links between senses, the subcategorization frames of the corresponding senses did not match consistently. With the top-down method, if the verb frames match in both languages, their senses line up more quickly to each other. The results indicate that the top-down method is more promising in linking English and Chinese sense inventories.
  • De Lange, F. P., Hagoort, P., & Toni, I. (2003). Differential fronto-parietal contributions to visual and motor imagery. NeuroImage, 19(2), e2094-e2095.

    Abstract

    Mental imagery is a cognitive process crucial to human reasoning. Numerous studies have characterized specific
    instances of this cognitive ability, as evoked by visual imagery (VI) or motor imagery (MI) tasks. However, it
    remains unclear which neural resources are shared between VI and MI, and which are exclusively related to MI.
    To address this issue, we have used fMRI to measure human brain activity during performance of VI and MI
    tasks. Crucially, we have modulated the imagery process by manipulating the degree of mental rotation necessary
    to solve the tasks. We focused our analysis on changes in neural signal as a function of the degree of mental
    rotation in each task.
  • Lattenkamp, E. Z., Vernes, S. C., & Wiegrebe, L. (2018). Mammalian models for the study of vocal learning: A new paradigm in bats. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 235-237). Toruń, Poland: NCU Press. doi:10.12775/3991-1.056.
  • Lauscher, A., Eckert, K., Galke, L., Scherp, A., Rizvi, S. T. R., Ahmed, S., Dengel, A., Zumstein, P., & Klein, A. (2018). Linked open citation database: Enabling libraries to contribute to an open and interconnected citation graph. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 109-118). New York: ACM. doi:10.1145/3197026.3197050.

    Abstract

    Citations play a crucial role in the scientific discourse, in information retrieval, and in bibliometrics. Many initiatives are currently promoting the idea of having free and open citation data. Creation of citation data, however, is not part of the cataloging workflow in libraries nowadays.
    In this paper, we present our project Linked Open Citation Database, in which we design distributed processes and a system infrastructure based on linked data technology. The goal is to show that efficiently cataloging citations in libraries using a semi-automatic approach is possible. We specifically describe the current state of the workflow and its implementation. We show that we could significantly improve the automatic reference extraction that is crucial for the subsequent data curation. We further give insights on the curation and linking process and provide evaluation results that not only direct the further development of the project, but also allow us to discuss its overall feasibility.
  • Lefever, E., Hendrickx, I., Croijmans, I., Van den Bosch, A., & Majid, A. (2018). Discovering the language of wine reviews: A text mining account. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 3297-3302). Paris: LREC.

    Abstract

    It is widely held that smells and flavors are impossible to put into words. In this paper we test this claim by seeking predictive patterns in wine reviews, which ostensibly aim to provide guides to perceptual content. Wine reviews have previously been critiqued as random and meaningless. We collected an English corpus of wine reviews with their structured metadata, and applied machine learning techniques to automatically predict the wine's color, grape variety, and country of origin. To train the three supervised classifiers, three different information sources were incorporated: lexical bag-of-words features, domain-specific terminology features, and semantic word embedding features. In addition, using regression analysis we investigated basic review properties, i.e., review length, average word length, and their relationship to the scalar values of price and review score. Our results show that wine experts do share a common vocabulary to describe wines and they use this in a consistent way, which makes it possible to automatically predict wine characteristics based on the review text alone. This means that odors and flavors may be more expressible in language than typically acknowledged.
  • Levelt, C. C., Fikkert, P., & Schiller, N. O. (2003). Metrical priming in speech production. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2481-2485). Adelaide: Causal Productions.

    Abstract

    In this paper we report on four experiments in which we attempted to prime the stress position of Dutch bisyllabic target nouns. These nouns, picture names, had stress on either the first or the second syllable. Auditory prime words had either the same stress as the target or a different stress (e.g., WORtel – MOtor vs. koSTUUM – MOtor; capital letters indicate stressed syllables in prime – target pairs). Furthermore, half of the prime words were semantically related, the other half were unrelated. In none of the experiments a stress priming effect was found. This could mean that stress is not stored in the lexicon. An additional finding was that targets with initial stress had a faster response than targets with a final stress. We hypothesize that bisyllabic words with final stress take longer to be encoded because this stress pattern is irregular with respect to the lexical distribution of bisyllabic stress patterns, even though it can be regular in terms of the metrical stress rules of Dutch.
  • Levelt, W. J. M., & Plomp, R. (1962). Musical consonance and critical bandwidth. In Proceedings of the 4th International Congress Acoustics (pp. 55-55).
  • Levshina, N., & Moran, S. (Eds.). (2021). Efficiency in human languages: Corpus evidence for universal principles [Special Issue]. Linguistics Vanguard, 7(s3).
  • Lopopolo, A., Frank, S. L., Van den Bosch, A., Nijhof, A., & Willems, R. M. (2018). The Narrative Brain Dataset (NBD), an fMRI dataset for the study of natural language processing in the brain. In B. Devereux, E. Shutova, & C.-R. Huang (Eds.), Proceedings of LREC 2018 Workshop "Linguistic and Neuro-Cognitive Resources (LiNCR) (pp. 8-11). Paris: LREC.

    Abstract

    We present the Narrative Brain Dataset, an fMRI dataset that was collected during spoken presentation of short excerpts of three
    stories in Dutch. Together with the brain imaging data, the dataset contains the written versions of the stimulation texts. The texts are
    accompanied with stochastic (perplexity and entropy) and semantic computational linguistic measures. The richness and unconstrained
    nature of the data allows the study of language processing in the brain in a more naturalistic setting than is common for fMRI studies.
    We hope that by making NBD available we serve the double purpose of providing useful neural data to researchers interested in natural
    language processing in the brain and to further stimulate data sharing in the field of neuroscience of language.
  • Lupyan, G., Wendorf, A., Berscia, L. M., & Paul, J. (2018). Core knowledge or language-augmented cognition? The case of geometric reasoning. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 252-254). Toruń, Poland: NCU Press. doi:10.12775/3991-1.062.
  • Mai, F., Galke, L., & Scherp, A. (2018). Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text. In J. Chen, M. A. Gonçalves, J. M. Allen, E. A. Fox, M.-Y. Kan, & V. Petras (Eds.), JCDL '18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (pp. 169-178). New York: ACM.

    Abstract

    For (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competitive with the performance on the full-texts if the same number of training samples is used for training. However, it is much easier to obtain title data in large quantities and to use it for training than full-text data. In this paper, we investigate the question how models obtained from training on increasing amounts of title training data compare to models from training on a constant number of full-texts. We evaluate this question on a large-scale dataset from the medical domain (PubMed) and from economics (EconBiz). In these datasets, the titles and annotations of millions of publications are available, and they outnumber the available full-texts by a factor of 20 and 15, respectively. To exploit these large amounts of data to their full potential, we develop three strong deep learning classifiers and evaluate their performance on the two datasets. The results are promising. On the EconBiz dataset, all three classifiers outperform their full-text counterparts by a large margin. The best title-based classifier outperforms the best full-text method by 9.4%. On the PubMed dataset, the best title-based method almost reaches the performance of the best full-text classifier, with a difference of only 2.9%.
  • Majid, A., & Bowerman, M. (Eds.). (2007). Cutting and breaking events: A crosslinguistic perspective [Special Issue]. Cognitive Linguistics, 18(2).

    Abstract

    This special issue of Cognitive Linguistics explores the linguistic encoding of events of cutting and breaking. In this article we first introduce the project on which it is based by motivating the selection of this conceptual domain, presenting the methods of data collection used by all the investigators, and characterizing the language sample. We then present a new approach to examining crosslinguistic similarities and differences in semantic categorization. Applying statistical modeling to the descriptions of cutting and breaking events elicited from speakers of all the languages, we show that although there is crosslinguistic variation in the number of distinctions made and in the placement of category boundaries, these differences take place within a strongly constrained semantic space: across languages, there is a surprising degree of consensus on the partitioning of events in this domain. In closing, we compare our statistical approach with more conventional semantic analyses, and show how an extensional semantic typological approach like the one illustrated here can help illuminate the intensional distinctions made by languages.
  • Malaisé, V., Gazendam, L., & Brugman, H. (2007). Disambiguating automatic semantic annotation based on a thesaurus structure. In Proceedings of TALN 2007.
  • Mamus, E., Speed, L. J., Ozyurek, A., & Majid, A. (2021). Sensory modality of input influences encoding of motion events in speech but not co-speech gestures. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 376-382). Vienna: Cognitive Science Society.

    Abstract

    Visual and auditory channels have different affordances and
    this is mirrored in what information is available for linguistic
    encoding. The visual channel has high spatial acuity, whereas
    the auditory channel has better temporal acuity. These
    differences may lead to different conceptualizations of events
    and affect multimodal language production. Previous studies of
    motion events typically present visual input to elicit speech and
    gesture. The present study compared events presented as audio-
    only, visual-only, or multimodal (visual+audio) input and
    assessed speech and co-speech gesture for path and manner of
    motion in Turkish. Speakers with audio-only input mentioned
    path more and manner less in verbal descriptions, compared to
    speakers who had visual input. There was no difference in the
    type or frequency of gestures across conditions, and gestures
    were dominated by path-only gestures. This suggests that input
    modality influences speakers’ encoding of path and manner of
    motion events in speech, but not in co-speech gestures.
  • McQueen, J. M., & Cho, T. (2003). The use of domain-initial strengthening in segmentation of continuous English speech. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 2993-2996). Adelaide: Causal Productions.
  • Meeuwissen, M., Roelofs, A., & Levelt, W. J. M. (2003). Naming analog clocks conceptually facilitates naming digital clocks. In Proceedings of XIII Conference of the European Society of Cognitive Psychology (ESCOP 2003) (pp. 271-271).
  • Merkx, D., & Frank, S. L. (2021). Human sentence processing: Recurrence or attention? In E. Chersoni, N. Hollenstein, C. Jacobs, Y. Oseki, L. Prévot, & E. Santus (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2021) (pp. 12-22). Stroudsburg, PA, USA: Association for Computational Linguistics (ACL). doi:10.18653/v1/2021.cmcl-1.2.

    Abstract

    Recurrent neural networks (RNNs) have long been an architecture of interest for computational models of human sentence processing. The recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks but little is known about its ability to model human language processing. We compare Transformer- and RNN-based language models’ ability to account for measures of human reading effort. Our analysis shows Transformers to outperform RNNs in explaining self-paced reading times and neural activity during reading English sentences, challenging the widely held idea that human sentence processing involves recurrent and immediate processing and provides evidence for cue-based retrieval.
  • Merkx, D., Frank, S. L., & Ernestus, M. (2021). Semantic sentence similarity: Size does not always matter. In Proceedings of Interspeech 2021 (pp. 4393-4397). doi:10.21437/Interspeech.2021-1464.

    Abstract

    This study addresses the question whether visually grounded speech recognition (VGS) models learn to capture sentence semantics without access to any prior linguistic knowledge. We produce synthetic and natural spoken versions of a well known semantic textual similarity database and show that our VGS model produces embeddings that correlate well with human semantic similarity judgements. Our results show that a model trained on a small image-caption database outperforms two models trained on much larger databases, indicating that database size is not all that matters. We also investigate the importance of having multiple captions per image and find that this is indeed helpful even if the total number of images is lower, suggesting that paraphrasing is a valuable learning signal. While the general trend in the field is to create ever larger datasets to train models on, our findings indicate other characteristics of the database can just as important.
  • Merkx, D., & Scharenborg, O. (2018). Articulatory feature classification using convolutional neural networks. In Proceedings of Interspeech 2018 (pp. 2142-2146). doi:10.21437/Interspeech.2018-2275.

    Abstract

    The ultimate goal of our research is to improve an existing speech-based computational model of human speech recognition on the task of simulating the role of fine-grained phonetic information in human speech processing. As part of this work we are investigating articulatory feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Different approaches have been used to build AF classifiers, most notably multi-layer perceptrons. Recently, deep neural networks have been applied to the task of AF classification. This paper aims to improve AF classification by investigating two different approaches: 1) investigating the usefulness of a deep Convolutional neural network (CNN) for AF classification; 2) integrating the Mel filtering operation into the CNN architecture. The results showed a remarkable improvement in classification accuracy of the CNNs over state-of-the-art AF classification results for Dutch, most notably in the minority classes. Integrating the Mel filtering operation into the CNN architecture did not further improve classification performance.
  • Micklos, A., Macuch Silva, V., & Fay, N. (2018). The prevalence of repair in studies of language evolution. In C. Cuskley, M. Flaherty, H. Little, L. McCrohon, A. Ravignani, & T. Verhoef (Eds.), Proceedings of the 12th International Conference on the Evolution of Language (EVOLANG XII) (pp. 316-318). Toruń, Poland: NCU Press. doi:10.12775/3991-1.075.

Share this page