Publications

Displaying 1 - 100 of 178
  • Alhama, R. G., Siegelman, N., Frost, R., & Armstrong, B. C. (2019). The role of information in visual word recognition: A perceptually-constrained connectionist account. In A. Goel, C. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 83-89). Austin, TX: Cognitive Science Society.

    Abstract

    Proficient readers typically fixate near the center of a word, with a slight bias towards word onset. We explore a novel account of this phenomenon based on combining information-theory with visual perceptual constraints in a connectionist model of visual word recognition. This account posits that the amount of information-content available for word identification varies across fixation locations and across languages, thereby explaining the overall fixation location bias in different languages, making the novel prediction that certain words are more readily identified when fixating at an atypical fixation location, and predicting specific cross-linguistic differences. We tested these predictions across several simulations in English and Hebrew, and in a pilot behavioral experiment. Results confirmed that the bias to fixate closer to word onset aligns with maximizing information in the visual signal, that some words are more readily identified at atypical fixation locations, and that these effects vary to some degree across languages.
  • Allen, S. E. M. (1998). A discourse-pragmatic explanation for the subject-object asymmetry in early null arguments. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings of the GALA '97 Conference on Language Acquisition (pp. 10-15). Edinburgh, UK: Edinburgh University Press.

    Abstract

    The present paper assesses discourse-pragmatic factors as a potential explanation for the subject-object assymetry in early child language. It identifies a set of factors which characterize typical situations of informativeness (Greenfield & Smith, 1976), and uses these factors to identify informative arguments in data from four children aged 2;0 through 3;6 learning Inuktitut as a first language. In addition, it assesses the extent of the links between features of informativeness on one hand and lexical vs. null and subject vs. object arguments on the other. Results suggest that a pragmatics account of the subject-object asymmetry can be upheld to a greater extent than previous research indicates, and that several of the factors characterizing informativeness are good indicators of those arguments which tend to be omitted in early child language.
  • Aristar-Dry, H., Drude, S., Windhouwer, M., Gippert, J., & Nevskaya, I. (2012). „Rendering Endangered Lexicons Interoperable through Standards Harmonization”: The RELISH Project. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 766-770). European Language Resources Association (ELRA).

    Abstract

    The RELISH project promotes language-oriented research by addressing a two-pronged problem: (1) the lack of harmonization between digital standards for lexical information in Europe and America, and (2) the lack of interoperability among existing lexicons of endangered languages, in particular those created with the Shoebox/Toolbox lexicon building software. The cooperation partners in the RELISH project are the University of Frankfurt (FRA), the Max Planck Institute for Psycholinguistics (MPI Nijmegen), and Eastern Michigan University, the host of the Linguist List (ILIT). The project aims at harmonizing key European and American digital standards whose divergence has hitherto impeded international collaboration on language technology for resource creation and analysis, as well as web services for archive access. Focusing on several lexicons of endangered languages, the project will establish a unified way of referencing lexicon structure and linguistic concepts, and develop a procedure for migrating these heterogeneous lexicons to a standards-compliant format. Once developed, the procedure will be generalizable to the large store of lexical resources involved in the LEGO and DoBeS projects.
  • Badimala, P., Mishra, C., Venkataramana, R. K. M., Bukhari, S. S., & Dengel, A. (2019). A Study of Various Text Augmentation Techniques for Relation Classification in Free Text. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (pp. 360-367). Setúbal, Portugal: SciTePress Digital Library. doi:10.5220/0007311003600367.

    Abstract

    Data augmentation techniques have been widely used in visual recognition tasks as it is easy to generate new
    data by simple and straight forward image transformations. However, when it comes to text data augmen-
    tations, it is difficult to find appropriate transformation techniques which also preserve the contextual and
    grammatical structure of language texts. In this paper, we explore various text data augmentation techniques
    in text space and word embedding space. We study the effect of various augmented datasets on the efficiency
    of different deep learning models for relation classification in text.
  • Bauer, B. L. M. (2012). Functions of nominal apposition in Vulgar and Late Latin: Change in progress? In F. Biville, M.-K. Lhommé, & D. Vallat (Eds.), Latin vulgaire – latin tardif IX (pp. 207-220). Lyon: Maison de l’Orient et de la Méditerranné.

    Abstract

    Analysis of the functions of nominal apposition in a number of Latin authors representing different periods, genres, and
    linguistic registers shows (1) that nominal apposition in Latin had a wide variety of functions; (2) that genre had some
    effect on functional use; (3) that change did not affect semantic fields as such; and (4) that with time the occurrence of
    apposition increasingly came to depend on the semantic field and within the semantic field on the individual lexical items.
    The ‘per-word’ treatment –also attested for the structural development of nominal apposition– underscores the specific
    characteristics of nominal apposition as a phenomenon at the cross-roads of syntax and derivational morphology
  • Benazzo, S., Flecken, M., & Soroli, E. (Eds.). (2012). Typological perspectives on language and thought: Thinking for speaking in L2. [Special Issue]. Language, Interaction and Acquisition, 3(2).
  • Bentum, M., Ten Bosch, L., Van den Bosch, A., & Ernestus, M. (2019). Listening with great expectations: An investigation of word form anticipations in naturalistic speech. In Proceedings of Interspeech 2019 (pp. 2265-2269). doi:10.21437/Interspeech.2019-2741.

    Abstract

    The event-related potential (ERP) component named phonological mismatch negativity (PMN) arises when listeners hear an unexpected word form in a spoken sentence [1]. The PMN is thought to reflect the mismatch between expected and perceived auditory speech input. In this paper, we use the PMN to test a central premise in the predictive coding framework [2], namely that the mismatch between prior expectations and sensory input is an important mechanism of perception. We test this with natural speech materials containing approximately 50,000 word tokens. The corresponding EEG-signal was recorded while participants (n = 48) listened to these materials. Following [3], we quantify the mismatch with two word probability distributions (WPD): a WPD based on preceding context, and a WPD that is additionally updated based on the incoming audio of the current word. We use the between-WPD cross entropy for each word in the utterances and show that a higher cross entropy correlates with a more negative PMN. Our results show that listeners anticipate auditory input while processing each word in naturalistic speech. Moreover, complementing previous research, we show that predictive language processing occurs across the whole probability spectrum.
  • Bentum, M., Ten Bosch, L., Van den Bosch, A., & Ernestus, M. (2019). Quantifying expectation modulation in human speech processing. In Proceedings of Interspeech 2019 (pp. 2270-2274). doi:10.21437/Interspeech.2019-2685.

    Abstract

    The mismatch between top-down predicted and bottom-up perceptual input is an important mechanism of perception according to the predictive coding framework (Friston, [1]). In this paper we develop and validate a new information-theoretic measure that quantifies the mismatch between expected and observed auditory input during speech processing. We argue that such a mismatch measure is useful for the study of speech processing. To compute the mismatch measure, we use naturalistic speech materials containing approximately 50,000 word tokens. For each word token we first estimate the prior word probability distribution with the aid of statistical language modelling, and next use automatic speech recognition to update this word probability distribution based on the unfolding speech signal. We validate the mismatch measure with multiple analyses, and show that the auditory-based update improves the probability of the correct word and lowers the uncertainty of the word probability distribution. Based on these results, we argue that it is possible to explicitly estimate the mismatch between predicted and perceived speech input with the cross entropy between word expectations computed before and after an auditory update.
  • Bergmann, C., Boves, L., & Ten Bosch, L. (2012). A model of the Headturn Preference Procedure: Linking cognitive processes to overt behaviour. In Proceedings of the 2012 IEEE Conference on Development and Learning and Epigenetic Robotics (IEEE ICDL-EpiRob 2012), San Diego, CA.

    Abstract

    The study of first language acquisition still strongly relies on behavioural methods to measure underlying linguistic abilities. In the present paper, we closely examine and model one such method, the headturn preference procedure (HPP), which is widely used to measure infant speech segmentation and word recognition abilities Our model takes real speech as input, and only uses basic sensory processing and cognitive capabilities to simulate observable behaviour.We show that the familiarity effect found in many HPP experiments can be simulated without using the phonetic and phonological skills necessary for segmenting test sentences into words. The explicit modelling of the process that converts the result of the cognitive processing of the test sentences into observable behaviour uncovered two issues that can lead to null-results in HPP studies. Our simulations show that caution is needed in making inferences about underlying language skills from behaviour in HPP experiments. The simulations also generated questions that must be addressed in future HPP studies.
  • Bögels, S., Barr, D., Garrod, S., & Kessler, K. (2013). "Are we still talking about the same thing?" MEG reveals perspective-taking in response to pragmatic violations, but not in anticipation. In M. Knauff, N. Pauen, I. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 215-220). Austin, TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0066/index.html.

    Abstract

    The current study investigates whether mentalizing, or taking the perspective of your interlocutor, plays an essential role throughout a conversation or whether it is mostly used in reaction to misunderstandings. This study is the first to use a brain-imaging method, MEG, to answer this question. In a first phase of the experiment, MEG participants interacted "live" with a confederate who set naming precedents for certain pictures. In a later phase, these precedents were sometimes broken by a speaker who named the same picture in a different way. This could be done by the same speaker, who set the precedent, or by a different speaker. Source analysis of MEG data showed that in the 800 ms before the naming, when the picture was already on the screen, episodic memory and language areas were activated, but no mentalizing areas, suggesting that the speaker's naming intentions were not anticipated by the listener on the basis of shared experiences. Mentalizing areas only became activated after the same speaker had broken a precedent, which we interpret as a reaction to the violation of conversational pragmatics.
  • Bone, D., Ramanarayanan, V., Narayanan, S., Hoedemaker, R. S., & Gordon, P. C. (2013). Analyzing eye-voice coordination in rapid automatized naming. In F. Bimbot, C. Cerisara, G. Fougeron, L. Gravier, L. Lamel, F. Pelligrino, & P. Perrier (Eds.), INTERSPEECH-2013: 14thAnnual Conference of the International Speech Communication Association (pp. 2425-2429). ISCA Archive. Retrieved from http://www.isca-speech.org/archive/interspeech_2013/i13_2425.html.

    Abstract

    Rapid Automatized Naming (RAN) is a powerful tool for pre- dicting future reading skill. A person’s ability to quickly name symbols as they scan a table is related to higher-level reading proficiency in adults and is predictive of future literacy gains in children. However, noticeable differences are present in the strategies or patterns within groups having similar task comple- tion times. Thus, a further stratification of RAN dynamics may lead to better characterization and later intervention to support reading skill acquisition. In this work, we analyze the dynamics of the eyes, voice, and the coordination between the two during performance. It is shown that fast performers are more similar to each other than to slow performers in their patterns, but not vice versa. Further insights are provided about the patterns of more proficient subjects. For instance, fast performers tended to exhibit smoother behavior contours, suggesting a more sta- ble perception-production process.
  • Brandt, M., Nitschke, S., & Kidd, E. (2012). Experience and processing of relative clauses in German. In A. K. Biller, E. Y. Chung, & A. E. Kimball (Eds.), Proceedings of the 36th annual Boston University Conference on Language Development (BUCLD 36) (pp. 87-100). Boston, MA: Cascadilla Press.
  • Brehm, L., Jackson, C. N., & Miller, K. L. (2019). Incremental interpretation in the first and second language. In M. Brown, & B. Dailey (Eds.), BUCLD 43: Proceedings of the 43rd annual Boston University Conference on Language Development (pp. 109-122). Sommerville, MA: Cascadilla Press.
  • Broeder, D., Van Uytvanck, D., & Senft, G. (2012). Citing on-line language resources. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 1391-1394). European Language Resources Association (ELRA).

    Abstract

    Although the possibility of referring or citing on-line data from publications is seen at least theoretically as an important means to provide immediate testable proof or simple illustration of a line of reasoning, the practice has not been wide-spread yet and no extensive experience has been gained about the possibilities and problems of referring to raw data-sets. This paper makes a case to investigate the possibility and need of persistent data visualization services that facilitate the inspection and evaluation of the cited data.
  • Broeder, D., Van Uytvanck, D., Gavrilidou, M., Trippel, T., & Windhouwer, M. (2012). Standardizing a component metadata infrastructure. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 1387-1390). European Language Resources Association (ELRA).

    Abstract

    This paper describes the status of the standardization efforts of a Component Metadata approach for describing Language Resources with metadata. Different linguistic and Language & Technology communities as CLARIN, META-SHARE and NaLiDa use this component approach and see its standardization of as a matter for cooperation that has the possibility to create a large interoperable domain of joint metadata. Starting with an overview of the component metadata approach together with the related semantic interoperability tools and services as the ISOcat data category registry and the relation registry we explain the standardization plan and efforts for component metadata within ISO TC37/SC4. Finally, we present information about uptake and plans of the use of component metadata within the three mentioned linguistic and L&T communities.
  • Broersma, M. (2012). Lexical representation of perceptually difficult second-language words [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 2053.

    Abstract

    This study investigates the lexical representation of second-language words that contain difficult to distinguish phonemes. Dutch and English listeners' perception of partially onset-overlapping word pairs like DAFFOdil-DEFIcit and minimal pairs like flash-flesh, was assessed with two cross-modal priming experiments, examining two stages of lexical processing: activation of intended and mismatching lexical representations (Exp.1) and competition between those lexical representations (Exp.2). Exp.1 shows that truncated primes like daffo- and defi- activated lexical representations of mismatching words (either deficit or daffodil) more for L2 than L1 listeners. Exp.2 shows that for minimal pairs, matching primes (prime: flash, target: FLASH) facilitated recognition of visual targets for L1 and L2 listeners alike, whereas mismatching primes (flesh, FLASH) inhibited recognition consistently for L1 listeners but only in a minority of cases for L2 listeners; in most cases, for them, primes facilitated recognition of both words equally strongly. Importantly, all listeners experienced a combination of facilitation and inhibition (and all items sometimes caused facilitation and sometimes inhibition). These results suggest that for all participants, some of the minimal pairs were represented with separate, native-like lexical representations, whereas other pairs were stored as homophones. The nature of the L2 lexical representations thus varied strongly even within listeners.
  • Bruggeman, L., & Cutler, A. (2019). The dynamics of lexical activation and competition in bilinguals’ first versus second language. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1342-1346). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Speech input causes listeners to activate multiple
    candidate words which then compete with one
    another. These include onset competitors, that share a
    beginning (bumper, butter), but also, counterintuitively,
    rhyme competitors, sharing an ending
    (bumper, jumper). In L1, competition is typically
    stronger for onset than for rhyme. In L2, onset
    competition has been attested but rhyme competition
    has heretofore remained largely unexamined. We
    assessed L1 (Dutch) and L2 (English) word
    recognition by the same late-bilingual individuals. In
    each language, eye gaze was recorded as listeners
    heard sentences and viewed sets of drawings: three
    unrelated, one depicting an onset or rhyme competitor
    of a word in the input. Activation patterns revealed
    substantial onset competition but no significant
    rhyme competition in either L1 or L2. Rhyme
    competition may thus be a “luxury” feature of
    maximally efficient listening, to be abandoned when
    resources are scarcer, as in listening by late
    bilinguals, in either language.
  • Casillas, M., & Frank, M. C. (2012). Cues to turn boundary prediction in adults and preschoolers. In S. Brown-Schmidt, J. Ginzburg, & S. Larsson (Eds.), Proceedings of SemDial 2012 (SeineDial): The 16th Workshop on the Semantics and Pragmatics of Dialogue (pp. 61-69). Paris: Université Paris-Diderot.

    Abstract

    Conversational turns often proceed with very brief pauses between speakers. In order to maintain “no gap, no overlap” turntaking, we must be able to anticipate when an ongoing utterance will end, tracking the current speaker for upcoming points of potential floor exchange. The precise set of cues that listeners use for turn-end boundary anticipation is not yet established. We used an eyetracking paradigm to measure adults’ and children’s online turn processing as they watched videos of conversations in their native language (English) and a range of other languages they did not speak. Both adults and children anticipated speaker transitions effectively. In addition, we observed evidence of turn-boundary anticipation for questions even in languages that were unknown to participants, suggesting that listeners’ success in turn-end anticipation does not rely solely on lexical information.
  • Casillas, M., & Frank, M. C. (2013). The development of predictive processes in children’s discourse understanding. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society. (pp. 299-304). Austin,TX: Cognitive Society.

    Abstract

    We investigate children’s online predictive processing as it occurs naturally, in conversation. We showed 1–7 year-olds short videos of improvised conversation between puppets, controlling for available linguistic information through phonetic manipulation. Even one- and two-year-old children made accurate and spontaneous predictions about when a turn-switch would occur: they gazed at the upcoming speaker before they heard a response begin. This predictive skill relies on both lexical and prosodic information together, and is not tied to either type of information alone. We suggest that children integrate prosodic, lexical, and visual information to effectively predict upcoming linguistic material in conversation.
  • Chu, M., & Kita, S. (2012). The nature of the beneficial role of spontaneous gesture in spatial problem solving [Abstract]. Cognitive Processing; Special Issue "ICSC 2012, the 5th International Conference on Spatial Cognition: Space and Embodied Cognition". Oral Presentations, 13(Suppl. 1), S39.

    Abstract

    Spontaneous gestures play an important role in spatial problem solving. We investigated the functional role and underlying mechanism of spontaneous gestures in spatial problem solving. In Experiment 1, 132 participants were required to solve a mental rotation task (see Figure 1) without speaking. Participants gestured more frequently in difficult trials than in easy trials. In Experiment 2, 66 new participants were given two identical sets of mental rotation tasks problems, as the one used in experiment 1. Participants who were encouraged to gesture in the first set of mental rotation task problemssolved more problems correctly than those who were allowed to gesture or those who were prohibited from gesturing both in the first set and in the second set in which all participants were prohibited from gesturing. The gestures produced by the gestureencouraged group and the gesture-allowed group were not qualitatively different. In Experiment 3, 32 new participants were first given a set of mental rotation problems and then a second set of nongesturing paper folding problems. The gesture-encouraged group solved more problems correctly in the first set of mental rotation problems and the second set of non-gesturing paper folding problems. We concluded that gesture improves spatial problem solving. Furthermore, gesture has a lasting beneficial effect even when gesture is not available and the beneficial effect is problem-general.We suggested that gesture enhances spatial problem solving by provide a rich sensori-motor representation of the physical world and pick up information that is less readily available to visuo-spatial processes.
  • Collins, J. (2012). The evolution of the Greenbergian word order correlations. In T. C. Scott-Phillips, M. Tamariz, E. A. Cartmill, & J. R. Hurford (Eds.), The evolution of language. Proceedings of the 9th International Conference (EVOLANG9) (pp. 72-79). Singapore: World Scientific.
  • Connell, L., Cai, Z. G., & Holler, J. (2012). Do you see what I'm singing? Visuospatial movement biases pitch perception. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 252-257). Austin, TX: Cognitive Science Society.

    Abstract

    The nature of the connection between musical and spatial processing is controversial. While pitch may be described in spatial terms such as “high” or “low”, it is unclear whether pitch and space are associated but separate dimensions or whether they share representational and processing resources. In the present study, we asked participants to judge whether a target vocal note was the same as (or different from) a preceding cue note. Importantly, target trials were presented as video clips where a singer sometimes gestured upward or downward while singing that target note, thus providing an alternative, concurrent source of spatial information. Our results show that pitch discrimination was significantly biased by the spatial movement in gesture. These effects were eliminated by spatial memory load but preserved under verbal memory load conditions. Together, our findings suggest that pitch and space have a shared representation such that the mental representation of pitch is audiospatial in nature.
  • Crago, M. B., Allen, S. E. M., & Pesco, D. (1998). Issues of Complexity in Inuktitut and English Child Directed Speech. In Proceedings of the twenty-ninth Annual Stanford Child Language Research Forum (pp. 37-46).
  • Cristia, A., & Peperkamp, S. (2012). Generalizing without encoding specifics: Infants infer phonotactic patterns on sound classes. In A. K. Biller, E. Y. Chung, & A. E. Kimball (Eds.), Proceedings of the 36th Annual Boston University Conference on Language Development (BUCLD 36) (pp. 126-138). Somerville, Mass.: Cascadilla Press.

    Abstract

    publication expected April 2012
  • Cutler, A., Burchfield, A., & Antoniou, M. (2019). A criterial interlocutor tally for successful talker adaptation? In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 1485-1489). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Part of the remarkable efficiency of listening is
    accommodation to unfamiliar talkers’ specific
    pronunciations by retuning of phonemic intercategory
    boundaries. Such retuning occurs in second
    (L2) as well as first language (L1); however, recent
    research with emigrés revealed successful adaptation
    in the environmental L2 but, unprecedentedly, not in
    L1 despite continuing L1 use. A possible explanation
    involving relative exposure to novel talkers is here
    tested in heritage language users with Mandarin as
    family L1 and English as environmental language. In
    English, exposure to an ambiguous sound in
    disambiguating word contexts prompted the expected
    adjustment of phonemic boundaries in subsequent
    categorisation. However, no adjustment occurred in
    Mandarin, again despite regular use. Participants
    reported highly asymmetric interlocutor counts in the
    two languages. We conclude that successful retuning
    ability requires regular exposure to novel talkers in
    the language in question, a criterion not met for the
    emigrés’ or for these heritage users’ L1.
  • Cutler, A., & Otake, T. (1998). Assimilation of place in Japanese and Dutch. In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: vol. 5 (pp. 1751-1754). Sydney: ICLSP.

    Abstract

    Assimilation of place of articulation across a nasal and a following stop consonant is obligatory in Japanese, but not in Dutch. In four experiments the processing of assimilated forms by speakers of Japanese and Dutch was compared, using a task in which listeners blended pseudo-word pairs such as ranga-serupa. An assimilated blend of this pair would be rampa, an unassimilated blend rangpa. Japanese listeners produced significantly more assimilated than unassimilated forms, both with pseudo-Japanese and pseudo-Dutch materials, while Dutch listeners produced significantly more unassimilated than assimilated forms in each materials set. This suggests that Japanese listeners, whose native-language phonology involves obligatory assimilation constraints, represent the assimilated nasals in nasal-stop sequences as unmarked for place of articulation, while Dutch listeners, who are accustomed to hearing unassimilated forms, represent the same nasal segments as marked for place of articulation.
  • Cutler, A., & Fear, B. D. (1991). Categoricality in acceptability judgements for strong versus weak vowels. In J. Llisterri (Ed.), Proceedings of the ESCA Workshop on Phonetics and Phonology of Speaking Styles (pp. 18.1-18.5). Barcelona, Catalonia: Universitat Autonoma de Barcelona.

    Abstract

    A distinction between strong and weak vowels can be drawn on the basis of vowel quality, of stress, or of both factors. An experiment was conducted in which sets of contextually matched word-intial vowels ranging from clearly strong to clearly weak were cross-spliced, and the naturalness of the resulting words was rated by listeners. The ratings showed that in general cross-spliced words were only significantly less acceptable than unspliced words when schwa was not involved; this supports a categorical distinction based on vowel quality.
  • Cutler, A. (1987). Components of prosodic effects in speech recognition. In Proceedings of the Eleventh International Congress of Phonetic Sciences: Vol. 1 (pp. 84-87). Tallinn: Academy of Sciences of the Estonian SSR, Institute of Language and Literature.

    Abstract

    Previous research has shown that listeners use the prosodic structure of utterances in a predictive fashion in sentence comprehension, to direct attention to accented words. Acoustically identical words spliced into sentence contexts arc responded to differently if the prosodic structure of the context is \ aricd: when the preceding prosody indicates that the word will he accented, responses are faster than when the preceding prosodv is inconsistent with accent occurring on that word. In the present series of experiments speech hybridisation techniques were first used to interchange the timing patterns within pairs of prosodic variants of utterances, independently of the pitch and intensity contours. The time-adjusted utterances could then serve as a basis lor the orthogonal manipulation of the three prosodic dimensions of pilch, intensity and rhythm. The overall pattern of results showed that when listeners use prosody to predict accent location, they do not simply rely on a single prosodic dimension, hut exploit the interaction between pitch, intensity and rhythm.
  • Cutler, A. (1998). How listeners find the right words. In Proceedings of the Sixteenth International Congress on Acoustics: Vol. 2 (pp. 1377-1380). Melville, NY: Acoustical Society of America.

    Abstract

    Languages contain tens of thousands of words, but these are constructed from a tiny handful of phonetic elements. Consequently, words resemble one another, or can be embedded within one another, a coup stick snot with standing. me process of spoken-word recognition by human listeners involves activation of multiple word candidates consistent with the input, and direct competition between activated candidate words. Further, human listeners are sensitive, at an early, prelexical, stage of speeeh processing, to constraints on what could potentially be a word of the language.
  • Cutler, A., Treiman, R., & Van Ooijen, B. (1998). Orthografik inkoncistensy ephekts in foneme detektion? In R. Mannell, & J. Robert-Ribes (Eds.), Proceedings of the Fifth International Conference on Spoken Language Processing: Vol. 6 (pp. 2783-2786). Sydney: ICSLP.

    Abstract

    The phoneme detection task is widely used in spoken word recognition research. Alphabetically literate participants, however, are more used to explicit representations of letters than of phonemes. The present study explored whether phoneme detection is sensitive to how target phonemes are, or may be, orthographically realised. Listeners detected the target sounds [b,m,t,f,s,k] in word-initial position in sequences of isolated English words. Response times were faster to the targets [b,m,t], which have consistent word-initial spelling, than to the targets [f,s,k], which are inconsistently spelled, but only when listeners’ attention was drawn to spelling by the presence in the experiment of many irregularly spelled fillers. Within the inconsistent targets [f,s,k], there was no significant difference between responses to targets in words with majority and minority spellings. We conclude that performance in the phoneme detection task is not necessarily sensitive to orthographic effects, but that salient orthographic manipulation can induce such sensitivity.
  • Cutler, A. (1991). Prosody in situations of communication: Salience and segmentation. In Proceedings of the Twelfth International Congress of Phonetic Sciences: Vol. 1 (pp. 264-270). Aix-en-Provence: Université de Provence, Service des publications.

    Abstract

    Speakers and listeners have a shared goal: to communicate. The processes of speech perception and of speech production interact in many ways under the constraints of this communicative goal; such interaction is as characteristic of prosodic processing as of the processing of other aspects of linguistic structure. Two of the major uses of prosodic information in situations of communication are to encode salience and segmentation, and these themes unite the contributions to the symposium introduced by the present review.
  • Cutler, A., & Butterfield, S. (1986). The perceptual integrity of initial consonant clusters. In R. Lawrence (Ed.), Speech and Hearing: Proceedings of the Institute of Acoustics (pp. 31-36). Edinburgh: Institute of Acoustics.
  • Cutler, A., & Carter, D. (1987). The prosodic structure of initial syllables in English. In J. Laver, & M. Jack (Eds.), Proceedings of the European Conference on Speech Technology: Vol. 1 (pp. 207-210). Edinburgh: IEE.
  • Cutler, A. (1998). The recognition of spoken words with variable representations. In D. Duez (Ed.), Proceedings of the ESCA Workshop on Sound Patterns of Spontaneous Speech (pp. 83-92). Aix-en-Provence: Université de Aix-en-Provence.
  • Cutler, A., & Bruggeman, L. (2013). Vocabulary structure and spoken-word recognition: Evidence from French reveals the source of embedding asymmetry. In Proceedings of INTERSPEECH: 14th Annual Conference of the International Speech Communication Association (pp. 2812-2816).

    Abstract

    Vocabularies contain hundreds of thousands of words built from only a handful of phonemes, so that inevitably longer words tend to contain shorter ones. In many languages (but not all) such embedded words occur more often word-initially than word-finally, and this asymmetry, if present, has farreaching consequences for spoken-word recognition. Prior research had ascribed the asymmetry to suffixing or to effects of stress (in particular, final syllables containing the vowel schwa). Analyses of the standard French vocabulary here reveal an effect of suffixing, as predicted by this account, and further analyses of an artificial variety of French reveal that extensive final schwa has an independent and additive effect in promoting the embedding asymmetry.
  • Defina, R., & Majid, A. (2012). Conceptual event units of putting and taking in two unrelated languages. In N. Miyake, D. Peebles, & R. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 1470-1475). Austin, TX: Cognitive Science Society.

    Abstract

    People automatically chunk ongoing dynamic events into discrete units. This paper investigates whether linguistic structure is a factor in this process. We test the claim that describing an event with a serial verb construction will influence a speaker’s conceptual event structure. The grammar of Avatime (a Kwa language spoken in Ghana)requires its speakers to describe some, but not all, placement events using a serial verb construction which also encodes the preceding taking event. We tested Avatime and English speakers’ recognition memory for putting and taking events. Avatime speakers were more likely to falsely recognize putting and taking events from episodes associated with takeput serial verb constructions than from episodes associated with other constructions. English speakers showed no difference in false recognitions between episode types. This demonstrates that memory for episodes is related to the type of language used; and, moreover, across languages different conceptual representations are formed for the same physical episode, paralleling habitual linguistic practices
  • Dideriksen, C., Fusaroli, R., Tylén, K., Dingemanse, M., & Christiansen, M. H. (2019). Contextualizing Conversational Strategies: Backchannel, Repair and Linguistic Alignment in Spontaneous and Task-Oriented Conversations. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci 2019) (pp. 261-267). Montreal, QB: Cognitive Science Society.

    Abstract

    Do interlocutors adjust their conversational strategies to the specific contextual demands of a given situation? Prior studies have yielded conflicting results, making it unclear how strategies vary with demands. We combine insights from qualitative and quantitative approaches in a within-participant experimental design involving two different contexts: spontaneously occurring conversations (SOC) and task-oriented conversations (TOC). We systematically assess backchanneling, other-repair and linguistic alignment. We find that SOC exhibit a higher number of backchannels, a reduced and more generic repair format and higher rates of lexical and syntactic alignment. TOC are characterized by a high number of specific repairs and a lower rate of lexical and syntactic alignment. However, when alignment occurs, more linguistic forms are aligned. The findings show that conversational strategies adapt to specific contextual demands.
  • Dieuleveut, A., Van Dooren, A., Cournane, A., & Hacquard, V. (2019). Acquiring the force of modals: Sig you guess what sig means? In M. Brown, & B. Dailey (Eds.), BUCLD 43: Proceedings of the 43rd annual Boston University Conference on Language Development (pp. 189-202). Sommerville, MA: Cascadilla Press.
  • Dingemanse, M., Hammond, J., Stehouwer, H., Somasundaram, A., & Drude, S. (2012). A high speed transcription interface for annotating primary linguistic data. In Proceedings of 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (pp. 7-12). Stroudsburg, PA: Association for Computational Linguistics.

    Abstract

    We present a new transcription mode for the annotation tool ELAN. This mode is designed to speed up the process of creating transcriptions of primary linguistic data (video and/or audio recordings of linguistic behaviour). We survey the basic transcription workflow of some commonly used tools (Transcriber, BlitzScribe, and ELAN) and describe how the new transcription interface improves on these existing implementations. We describe the design of the transcription interface and explore some further possibilities for improvement in the areas of segmentation and computational enrichment of annotations.
  • Dingemanse, M., & Majid, A. (2012). The semantic structure of sensory vocabulary in an African language. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 300-305). Austin, TX: Cognitive Science Society.

    Abstract

    The widespread occurrence of ideophones, large classes of words specialized in evoking sensory imagery, is little known outside linguistics and anthropology. Ideophones are a common feature in many of the world’s languages but are underdeveloped in English and other Indo-European languages. Here we study the meanings of ideophones in Siwu (a Kwa language from Ghana) using a pile-sorting task. The goal was to uncover the underlying structure of the lexical space and to examine the claimed link between ideophones and perception. We found that Siwu ideophones are principally organized around fine-grained aspects of sensory perception, and map onto salient psychophysical dimensions identified in sensory science. The results ratify ideophones as dedicated sensory vocabulary and underline the relevance of ideophones for research on language and perception.
  • Doherty, M., & Klein, W. (Eds.). (1991). Übersetzung [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (84).
  • Dolscheid, S., Graver, C., & Casasanto, D. (2013). Spatial congruity effects reveal metaphors, not markedness. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 2213-2218). Austin,TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0405/index.html.

    Abstract

    Spatial congruity effects have often been interpreted as evidence for metaphorical thinking, but an alternative markedness-based account challenges this view. In two experiments, we directly compared metaphor and markedness explanations for spatial congruity effects, using musical pitch as a testbed. English speakers who talk about pitch in terms of spatial height were tested in speeded space-pitch compatibility tasks. To determine whether space-pitch congruency effects could be elicited by any marked spatial continuum, participants were asked to classify high- and low-frequency pitches as 'high' and 'low' or as 'front' and 'back' (both pairs of terms constitute cases of marked continuums). We found congruency effects in high/low conditions but not in front/back conditions, indicating that markedness is not sufficient to account for congruity effects (Experiment 1). A second experiment showed that congruency effects were specific to spatial words that cued a vertical schema (tall/short), and that congruity effects were not an artifact of polysemy (e.g., 'high' referring both to space and pitch). Together, these results suggest that congruency effects reveal metaphorical uses of spatial schemas, not markedness effects.
  • Dolscheid, S., Hunnius, S., Casasanto, D., & Majid, A. (2012). The sound of thickness: Prelinguistic infants' associations of space and pitch. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 306-311). Austin, TX: Cognitive Science Society.

    Abstract

    People often talk about musical pitch in terms of spatial metaphors. In English, for instance, pitches can be high or low, whereas in other languages pitches are described as thick or thin. According to psychophysical studies, metaphors in language can also shape people’s nonlinguistic space-pitch representations. But does language establish mappings between space and pitch in the first place or does it modify preexisting associations? Here we tested 4-month-old Dutch infants’ sensitivity to height-pitch and thickness-pitch mappings in two preferential looking tasks. Dutch infants looked significantly longer at cross-modally congruent stimuli in both experiments, indicating that infants are sensitive to space-pitch associations prior to language. This early presence of space-pitch mappings suggests that these associations do not originate from language. Rather, language may build upon pre-existing mappings and change them gradually via some form of competitive associative learning.
  • Drozd, K. F. (1998). No as a determiner in child English: A summary of categorical evidence. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings of the Gala '97 Conference on Language Acquisition (pp. 34-39). Edinburgh, UK: Edinburgh University Press,.

    Abstract

    This paper summarizes the results of a descriptive syntactic category analysis of child English no which reveals that young children use and represent no as a determiner and negatives like no pen as NPs, contra standard analyses.
  • Drude, S., Trilsbeek, P., & Broeder, D. (2012). Language Documentation and Digital Humanities: The (DoBeS) Language Archive. In J. C. Meister (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 169-173).

    Abstract

    Overview Since the early nineties, the on-going dramatic loss of the world’s linguistic diversity has gained attention, first by the linguists and increasingly also by the general public. As a response, the new field of language documentation emerged from around 2000 on, starting with the funding initiative ‘Dokumentation Bedrohter Sprachen’ (DoBeS, funded by the Volkswagen foundation, Germany), soon to be followed by others such as the ‘Endangered Languages Documentation Programme’ (ELDP, at SOAS, London), or, in the USA, ‘Electronic Meta-structure for Endangered Languages Documentation’ (EMELD, led by the LinguistList) and ‘Documenting Endangered Languages’ (DEL, by the NSF). From its very beginning, the new field focused on digital technologies not only for recording in audio and video, but also for annotation, lexical databases, corpus building and archiving, among others. This development not just coincides but is intrinsically interconnected with the increasing focus on digital data, technology and methods in all sciences, in particular in the humanities.
  • Drude, S., Broeder, D., Trilsbeek, P., & Wittenburg, P. (2012). The Language Archive: A new hub for language resources. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 3264-3267). European Language Resources Association (ELRA).

    Abstract

    This contribution presents “The Language Archive” (TLA), a new unit at the MPI for Psycholinguistics, discussing the current developments in management of scientific data, considering the need for new data research infrastructures. Although several initiatives worldwide in the realm of language resources aim at the integration, preservation and mobilization of research data, the state of such scientific data is still often problematic. Data are often not well organized and archived and not described by metadata ― even unique data such as field-work observational data on endangered languages is still mostly on perishable carriers. New data centres are needed that provide trusted, quality-reviewed, persistent services and suitable tools and that take legal and ethical issues seriously. The CLARIN initiative has established criteria for suitable centres. TLA is in a good position to be one of such centres. It is based on three essential pillars: (1) A data archive; (2) management, access and annotation tools; (3) archiving and software expertise for collaborative projects. The archive hosts mostly observational data on small languages worldwide and language acquisition data, but also data resulting from experiments
  • Durco, M., & Windhouwer, M. (2013). Semantic Mapping in CLARIN Component Metadata. In Proceedings of MTSR 2013, the 7th Metadata and Semantics Research Conference (pp. 163-168). New York: Springer.

    Abstract

    In recent years, large scale initiatives like CLARIN set out to overcome the notorious heterogeneity of metadata formats in the domain of language resource. The CLARIN Component Metadata Infrastructure established means for flexible resouce descriptions for the domain of language resources. The Data Category Registry ISOcat and the accompanying Relation Registry foster semantic interoperability within the growing heterogeneous collection of metadata records. This paper describes the CMD Infrastructure focusing on the facilities for semantic mapping, and gives also an overview of the current status in the joint component metadata domain.
  • Eijk, L., Ernestus, M., & Schriefers, H. (2019). Alignment of pitch and articulation rate. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 2690-2694). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    Previous studies have shown that speakers align their speech to each other at multiple linguistic levels. This study investigates whether alignment is mostly the result of priming from the immediately preceding
    speech materials, focussing on pitch and articulation rate (AR). Native Dutch speakers completed sentences, first by themselves (pre-test), then in alternation with Confederate 1 (Round 1), with Confederate 2 (Round 2), with Confederate 1 again
    (Round 3), and lastly by themselves again (post-test). Results indicate that participants aligned to the confederates and that this alignment lasted during the post-test. The confederates’ directly preceding sentences were not good predictors for the participants’ pitch and AR. Overall, the results indicate that alignment is more of a global effect than a local priming effect.
  • Eisner, F. (2012). Competition in the acoustic encoding of emotional speech. In L. McCrohon (Ed.), Five approaches to language evolution. Proceedings of the workshops of the 9th International Conference on the Evolution of Language (pp. 43-44). Tokyo: Evolang9 Organizing Committee.

    Abstract

    1. Introduction Speech conveys not only linguistic meaning but also paralinguistic information, such as features of the speaker’s social background, physiology, and emotional state. Linguistic and paralinguistic information is encoded in speech by using largely the same vocal apparatus and both are transmitted simultaneously in the acoustic signal, drawing on a limited set of acoustic cues. How this simultaneous encoding is achieved, how the different types of information are disentangled by the listener, and how much they interfere with one another is presently not well understood. Previous research has highlighted the importance of acoustic source and filter cues for emotion and linguistic encoding respectively, which may suggest that the two types of information are encoded independently of each other. However, those lines of investigation have been almost completely disconnected (Murray & Arnott, 1993).
  • Elbers, W., Broeder, D., & Van Uytvanck, D. (2012). Proper language resource centers. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 3260-3263). European Language Resources Association (ELRA).

    Abstract

    Language resource centers allow researchers to reliably deposit their structured data together with associated meta data and run services operating on this deposited data. We are looking into possibilities to create long-term persistency of both the deposited data and the services operating on this data. Challenges, both technical and non-technical, that need to be solved are the need to replicate more than just the data, proper identification of the digital objects in a distributed environment by making use of persistent identifiers and the set-up of a proper authentication and authorization domain including the management of the authorization information on the digital objects. We acknowledge the investment that most language resource centers have made in their current infrastructure. Therefore one of the most important requirements is the loose coupling with existing infrastructures without the need to make many changes. This shift from a single language resource center into a federated environment of many language resource centers is discussed in the context of a real world center: The Language Archive supported by the Max Planck Institute for Psycholinguistics.
  • Felker, E. R., Ernestus, M., & Broersma, M. (2019). Evaluating dictation task measures for the study of speech perception. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 383-387). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    This paper shows that the dictation task, a well-
    known testing instrument in language education, has
    untapped potential as a research tool for studying
    speech perception. We describe how transcriptions
    can be scored on measures of lexical, orthographic,
    phonological, and semantic similarity to target
    phrases to provide comprehensive information about
    accuracy at different processing levels. The former
    three measures are automatically extractable,
    increasing objectivity, and the middle two are
    gradient, providing finer-grained information than
    traditionally used. We evaluate the measures in an
    English dictation task featuring phonetically reduced
    continuous speech. Whereas the lexical and
    orthographic measures emphasize listeners’ word
    identification difficulties, the phonological measure
    demonstrates that listeners can often still recover
    phonological features, and the semantic measure
    captures their ability to get the gist of the utterances.
    Correlational analyses and a discussion of practical
    and theoretical considerations show that combining
    multiple measures improves the dictation task’s
    utility as a research tool.
  • Felker, E. R., Ernestus, M., & Broersma, M. (2019). Lexically guided perceptual learning of a vowel shift in an interactive L2 listening context. In Proceedings of Interspeech 2019 (pp. 3123-3127). doi:10.21437/Interspeech.2019-1414.

    Abstract

    Lexically guided perceptual learning has traditionally been studied with ambiguous consonant sounds to which native listeners are exposed in a purely receptive listening context. To extend previous research, we investigate whether lexically guided learning applies to a vowel shift encountered by non-native listeners in an interactive dialogue. Dutch participants played a two-player game in English in either a control condition, which contained no evidence for a vowel shift, or a lexically constraining condition, in which onscreen lexical information required them to re-interpret their interlocutor’s /ɪ/ pronunciations as representing /ε/. A phonetic categorization pre-test and post-test were used to assess whether the game shifted listeners’ phonemic boundaries such that more of the /ε/-/ɪ/ continuum came to be perceived as /ε/. Both listener groups showed an overall post-test shift toward /ɪ/, suggesting that vowel perception may be sensitive to directional biases related to properties of the speaker’s vowel space. Importantly, listeners in the lexically constraining condition made relatively more post-test /ε/ responses than the control group, thereby exhibiting an effect of lexically guided adaptation. The results thus demonstrate that non-native listeners can adjust their phonemic boundaries on the basis of lexical information to accommodate a vowel shift learned in interactive conversation.
  • Fisher, S. E., & Tilot, A. K. (Eds.). (2019). Bridging senses: Novel insights from synaesthesia [Special Issue]. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 374.
  • Fitch, W. T., Friederici, A. D., & Hagoort, P. (Eds.). (2012). Pattern perception and computational complexity [Special Issue]. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 367 (1598).
  • Flecken, M., & Gerwien, J. (2013). Grammatical aspect modulates event duration estimations: findings from Dutch. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual meeting of the Cognitive Science Society (CogSci 2013) (pp. 2309-2314). Austin,TX: Cognitive Science Society.
  • Friederici, A., & Levelt, W. J. M. (1987). Spatial description in microgravity: Aspects of cognitive adaptation. In P. R. Sahm, R. Jansen, & M. Keller (Eds.), Proceedings of the Norderney Symposium on Scientific Results of the German Spacelab Mission D1 (pp. 518-524). Köln, Germany: Wissenschaftliche Projektführung DI c/o DFVLR.
  • Frost, R. L. A., Isbilen, E. S., Christiansen, M. H., & Monaghan, P. (2019). Testing the limits of non-adjacent dependency learning: Statistical segmentation and generalisation across domains. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 1787-1793). Montreal, QB: Cognitive Science Society.

    Abstract

    Achieving linguistic proficiency requires identifying words from speech, and discovering the constraints that govern the way those words are used. In a recent study of non-adjacent dependency learning, Frost and Monaghan (2016) demonstrated that learners may perform these tasks together, using similar statistical processes - contrary to prior suggestions. However, in their study, non-adjacent dependencies were marked by phonological cues (plosive-continuant-plosive structure), which may have influenced learning. Here, we test the necessity of these cues by comparing learning across three conditions; fixed phonology, which contains these cues, varied phonology, which omits them, and shapes, which uses visual shape sequences to assess the generality of statistical processing for these tasks. Participants segmented the sequences and generalized the structure in both auditory conditions, but learning was best when phonological cues were present. Learning was around chance on both tasks for the visual shapes group, indicating statistical processing may critically differ across domains.
  • De la Fuente, J., Santiago, J., Roma, A., Dumitrache, C., & Casasanto, D. (2012). Facing the past: cognitive flexibility in the front-back mapping of time [Abstract]. Cognitive Processing; Special Issue "ICSC 2012, the 5th International Conference on Spatial Cognition: Space and Embodied Cognition". Poster Presentations, 13(Suppl. 1), S58.

    Abstract

    In many languages the future is in front and the past behind, but in some cultures (like Aymara) the past is in front. Is it possible to find this mapping as an alternative conceptualization of time in other cultures? If so, what are the factors that affect its choice out of the set of available alternatives? In a paper and pencil task, participants placed future or past events either in front or behind a character (a schematic head viewed from above). A sample of 24 Islamic participants (whose language also places the future in front and the past behind) tended to locate the past event in the front box more often than Spanish participants. This result might be due to the greater cultural value assigned to tradition in Islamic culture. The same pattern was found in a sample of Spanish elders (N = 58), what may support that conclusion. Alternatively, the crucial factor may be the amount of attention paid to the past. In a final study, young Spanish adults (N = 200) who had just answered a set of questions about their past showed the past-in-front pattern, whereas questions about their future exacerbated the future-in-front pattern. Thus, the attentional explanation was supported: attended events are mapped to front space in agreement with the experiential connection between attending and seeing. When attention is paid to the past, it tends to occupy the front location in spite of available alternative mappings in the language-culture.
  • Galke, L., Vagliano, I., & Scherp, A. (2019). Can graph neural networks go „online“? An analysis of pretraining and inference. In Proceedings of the Representation Learning on Graphs and Manifolds: ICLR2019 Workshop.

    Abstract

    Large-scale graph data in real-world applications is often not static but dynamic,
    i. e., new nodes and edges appear over time. Current graph convolution approaches
    are promising, especially, when all the graph’s nodes and edges are available dur-
    ing training. When unseen nodes and edges are inserted after training, it is not
    yet evaluated whether up-training or re-training from scratch is preferable. We
    construct an experimental setup, in which we insert previously unseen nodes and
    edges after training and conduct a limited amount of inference epochs. In this
    setup, we compare adapting pretrained graph neural networks against retraining
    from scratch. Our results show that pretrained models yield high accuracy scores
    on the unseen nodes and that pretraining is preferable over retraining from scratch.
    Our experiments represent a first step to evaluate and develop truly online variants
    of graph neural networks.
  • Galke, L., Melnychuk, T., Seidlmayer, E., Trog, S., Foerstner, K., Schultz, C., & Tochtermann, K. (2019). Inductive learning of concept representations from library-scale bibliographic corpora. In K. David, K. Geihs, M. Lange, & G. Stumme (Eds.), Informatik 2019: 50 Jahre Gesellschaft für Informatik - Informatik für Gesellschaft (pp. 219-232). Bonn: Gesellschaft für Informatik e.V. doi:10.18420/inf2019_26.
  • Gebre, B. G., Wittenburg, P., & Heskes, T. (2013). Automatic sign language identification. In Proceeding of the 20th IEEE International Conference on Image Processing (ICIP) (pp. 2626-2630).

    Abstract

    We propose a Random-Forest based sign language identification system. The system uses low-level visual features and is based on the hypothesis that sign languages have varying distributions of phonemes (hand-shapes, locations and movements). We evaluated the system on two sign languages -- British SL and Greek SL, both taken from a publicly available corpus, called Dicta Sign Corpus. Achieved average F1 scores are about 95% - indicating that sign languages can be identified with high accuracy using only low-level visual features.
  • Gebre, B. G., Wittenburg, P., & Heskes, T. (2013). Automatic signer diarization - the mover is the signer approach. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on (pp. 283-287). doi:10.1109/CVPRW.2013.49.

    Abstract

    We present a vision-based method for signer diarization -- the task of automatically determining "who signed when?" in a video. This task has similar motivations and applications as speaker diarization but has received little attention in the literature. In this paper, we motivate the problem and propose a method for solving it. The method is based on the hypothesis that signers make more movements than their interlocutors. Experiments on four videos (a total of 1.4 hours and each consisting of two signers) show the applicability of the method. The best diarization error rate (DER) obtained is 0.16.
  • Gebre, B. G., & Wittenburg, P. (2012). Adaptive automatic gesture stroke detection. In J. C. Meister (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 458-461).

    Abstract

    Print Friendly XML Gebre, Binyam Gebrekidan, Max Planck Institute for Psycholinguistics, The Netherlands, binyamgebrekidan.gebre [at] mpi.nl Wittenburg, Peter, Max Planck Institute for Psycholinguistics, The Netherlands, peter.wittenburg [at] mpi.nl Introduction Many gesture and sign language researchers manually annotate video recordings to systematically categorize, analyze and explain their observations. The number and kinds of annotations are so diverse and unpredictable that any attempt at developing non-adaptive automatic annotation systems is usually less effective. The trend in the literature has been to develop models that work for average users and for average scenarios. This approach has three main disadvantages. First, it is impossible to know beforehand all the patterns that could be of interest to all researchers. Second, it is practically impossible to find enough training examples for all patterns. Third, it is currently impossible to learn a model that is robustly applicable across all video quality-recording variations.
  • Gebre, B. G., Zampieri, M., Wittenburg, P., & Heskes, T. (2013). Improving Native Language Identification with TF-IDF weighting. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 216-223).

    Abstract

    This paper presents a Native Language Identification (NLI) system based on TF-IDF weighting schemes and using linear classifiers - support vector machines, logistic regressions and perceptrons. The system was one of the participants of the 2013 NLI Shared Task in the closed-training track, achieving 0.814 overall accuracy for a set of 11 native languages. This accuracy was only 2.2 percentage points lower than the winner's performance. Furthermore, with subsequent evaluations using 10-fold cross-validation (as given by the organizers) on the combined training and development data, the best average accuracy obtained is 0.8455 and the features that contributed to this accuracy are the TF-IDF of the combined unigrams and bigrams of words.
  • Gebre, B. G., Wittenburg, P., & Heskes, T. (2013). The gesturer is the speaker. In Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013) (pp. 3751-3755).

    Abstract

    We present and solve the speaker diarization problem in a novel way. We hypothesize that the gesturer is the speaker and that identifying the gesturer can be taken as identifying the active speaker. We provide evidence in support of the hypothesis from gesture literature and audio-visual synchrony studies. We also present a vision-only diarization algorithm that relies on gestures (i.e. upper body movements). Experiments carried out on 8.9 hours of a publicly available dataset (the AMI meeting data) show that diarization error rates as low as 15% can be achieved.
  • Gebre, B. G., Wittenburg, P., & Lenkiewicz, P. (2012). Towards automatic gesture stroke detection. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 231-235). European Language Resources Association.

    Abstract

    Automatic annotation of gesture strokes is important for many gesture and sign language researchers. The unpredictable diversity of human gestures and video recording conditions require that we adopt a more adaptive case-by-case annotation model. In this paper, we present a work-in progress annotation model that allows a user to a) track hands/face b) extract features c) distinguish strokes from non-strokes. The hands/face tracking is done with color matching algorithms and is initialized by the user. The initialization process is supported with immediate visual feedback. Sliders are also provided to support a user-friendly adjustment of skin color ranges. After successful initialization, features related to positions, orientations and speeds of tracked hands/face are extracted using unique identifiable features (corners) from a window of frames and are used for training a learning algorithm. Our preliminary results for stroke detection under non-ideal video conditions are promising and show the potential applicability of our methodology.
  • Gijssels, T., Bottini, R., Rueschemeyer, S.-A., & Casasanto, D. (2013). Space and time in the parietal cortex: fMRI Evidence for a meural asymmetry. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 495-500). Austin,TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0113/index.html.

    Abstract

    How are space and time related in the brain? This study contrasts two proposals that make different predictions about the interaction between spatial and temporal magnitudes. Whereas ATOM implies that space and time are symmetrically related, Metaphor Theory claims they are asymmetrically related. Here we investigated whether space and time activate the same neural structures in the inferior parietal cortex (IPC) and whether the activation is symmetric or asymmetric across domains. We measured participants’ neural activity while they made temporal and spatial judgments on the same visual stimuli. The behavioral results replicated earlier observations of a space-time asymmetry: Temporal judgments were more strongly influenced by irrelevant spatial information than vice versa. The BOLD fMRI data indicated that space and time activated overlapping clusters in the IPC and that, consistent with Metaphor Theory, this activation was asymmetric: The shared region of IPC was activated more strongly during temporal judgments than during spatial judgments. We consider three possible interpretations of this neural asymmetry, based on 3 possible functions of IPC.
  • Gisladottir, R. S., Chwilla, D., Schriefers, H., & Levinson, S. C. (2012). Speech act recognition in conversation: Experimental evidence. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 1596-1601). Austin, TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2012/papers/0282/index.html.

    Abstract

    Recognizing the speech acts in our interlocutors’ utterances is a crucial prerequisite for conversation. However, it is not a trivial task given that the form and content of utterances is frequently underspecified for this level of meaning. In the present study we investigate participants’ competence in categorizing speech acts in such action-underspecific sentences and explore the time-course of speech act inferencing using a self-paced reading paradigm. The results demonstrate that participants are able to categorize the speech acts with very high accuracy, based on limited context and without any prosodic information. Furthermore, the results show that the exact same sentence is processed differently depending on the speech act it performs, with reading times starting to differ already at the first word. These results indicate that participants are very good at “getting” the speech acts, opening up a new arena for experimental research on action recognition in conversation.
  • Goldrick, M., Brehm, L., Pyeong Whan, C., & Smolensky, P. (2019). Transient blend states and discrete agreement-driven errors in sentence production. In G. J. Snover, M. Nelson, B. O'Connor, & J. Pater (Eds.), Proceedings of the Society for Computation in Linguistics (SCiL 2019) (pp. 375-376). doi:10.7275/n0b2-5305.
  • Le Guen, O. (2012). Socializing with the supernatural: The place of supernatural entities in Yucatec Maya daily life and socialization. In P. Nondédéo, & A. Breton (Eds.), Maya daily lives: Proceedings of the 13th European Maya Conference (pp. 151-170). Markt Schwaben: Verlag Anton Saurwein.
  • Gussenhoven, C., & Zhou, W. (2013). Revisiting pitch slope and height effects on perceived duration. In Proceedings of INTERSPEECH 2013: 14th Annual Conference of the International Speech Communication Association (pp. 1365-1369).

    Abstract

    The shape of pitch contours has been shown to have an effect on the perceived duration of vowels. For instance, vowels with high level pitch and vowels with falling contours sound longer than vowels with low level pitch. Depending on whether the
    comparison is between level pitches or between level and dynamic contours, these findings have been interpreted in two ways. For inter-level comparisons, where the duration results are the reverse of production results, a hypercorrection strategy in production has been proposed [1]. By contrast, for comparisons between level pitches and dynamic contours, the
    longer production data for dynamic contours have been held responsible. We report an experiment with Dutch and Chinese listeners which aimed to show that production data and perception data are each other’s opposites for high, low, falling and rising contours. We explain the results, which are consistent with earlier findings, in terms of the compensatory listening strategy of [2], arguing that the perception effects are due to a perceptual compensation of articulatory strategies and
    constraints, rather than that differences in production compensate for psycho-acoustic perception effects.
  • Habscheid, S., & Klein, W. (Eds.). (2012). Dinge und Maschinen in der Kommunikation [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, 42(168).

    Abstract

    “The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” (Weiser 1991, S. 94). – Die Behauptung stammt aus einem vielzitierten Text von Mark Weiser, ehemals Chief Technology Officer am berühmten Xerox Palo Alto Research Center (PARC), wo nicht nur einige bedeutende computertechnische Innovationen ihren Ursprung hatten, sondern auch grundlegende anthropologische Einsichten zum Umgang mit technischen Artefakten gewonnen wurden.1 In einem populärwissenschaftlichen Artikel mit dem Titel „The Computer for the 21st Century” entwarf Weiser 1991 die Vision einer Zukunft, in der wir nicht mehr mit einem einzelnen PC an unserem Arbeitsplatz umgehen – vielmehr seien wir in jedem Raum umgeben von hunderten elektronischer Vorrichtungen, die untrennbar in Alltagsgegenstände eingebettet und daher in unserer materiellen Umwelt gleichsam „verschwunden“ sind. Dabei ging es Weiser nicht allein um das ubiquitäre Phänomen, das in der Medientheorie als „Transparenz der Medien“ bekannt ist2 oder in allgemeineren Theorien der Alltagserfahrung als eine selbstverständliche Verwobenheit des Menschen mit den Dingen, die uns in ihrem Sinn vertraut und praktisch „zuhanden“ sind.3 Darüber hinaus zielte Weisers Vision darauf, unsere bereits existierende Umwelt durch computerlesbare Daten zu erweitern und in die Operationen eines solchen allgegenwärtigen Netzwerks alltägliche Praktiken gleichsam lückenlos zu integrieren: In der Welt, die Weiser entwirft, öffnen sich Türen für denjenigen, der ein bestimmtes elektronisches Abzeichen trägt, begrüßen Räume Personen, die sie betreten, mit Namen, passen sich Computerterminals an die Präferenzen individueller Nutzer an usw. (Weiser 1991, S. 99).
  • Haderlein, T., Moers, C., Möbius, B., & Nöth, E. (2012). Automatic rating of hoarseness by text-based cepstral and prosodic evaluation. In P. Sojka, A. Horák, I. Kopecek, & K. Pala (Eds.), Proceedings of the 15th International Conference on Text, Speech and Dialogue (TSD 2012) (pp. 573-580). Heidelberg: Springer.

    Abstract

    The standard for the analysis of distorted voices is perceptual rating of read-out texts or spontaneous speech. Automatic voice evaluation, however, is usually done on stable sections of sustained vowels. In this paper, text-based and established vowel-based analysis are compared with respect to their ability to measure hoarseness and its subclasses. 73 hoarse patients (48.3±16.8 years) uttered the vowel /e/ and read the German version of the text “The North Wind and the Sun”. Five speech therapists and physicians rated roughness, breathiness, and hoarseness according to the German RBH evaluation scheme. The best human-machine correlations were obtained for measures based on the Cepstral Peak Prominence (CPP; up to |r | = 0.73). Support Vector Regression (SVR) on CPP-based measures and prosodic features improved the results further to r ≈0.8 and confirmed that automatic voice evaluation should be performed on a text recording.
  • Hahn, L. E., Ten Buuren, M., De Nijs, M., Snijders, T. M., & Fikkert, P. (2019). Acquiring novel words in a second language through mutual play with child songs - The Noplica Energy Center. In L. Nijs, H. Van Regenmortel, & C. Arculus (Eds.), MERYC19 Counterpoints of the senses: Bodily experiences in musical learning (pp. 78-87). Ghent, Belgium: EuNet MERYC 2019.

    Abstract

    Child songs are a great source for linguistic learning. Here we explore whether children can acquire novel words in a second language by playing a game featuring child songs in a playhouse. We present data from three studies that serve as scientific proof for the functionality of one game of the playhouse: the Energy Center. For this game, three hand-bikes were mounted on a panel. When children start moving the hand-bikes, child songs start playing simultaneously. Once the children produce enough energy with the hand-bikes, the songs are additionally accompanied with the sounds of musical instruments. In our studies, children executed a picture-selection task to evaluate whether they acquired new vocabulary from the songs presented during the game. Two of our studies were run in the field, one at a Dutch and one at an Indian pre-school. The third study features data from a more controlled laboratory setting. Our results partly confirm that the Energy Center is a successful means to support vocabulary acquisition in a second language. More research with larger sample sizes and longer access to the Energy Center is needed to evaluate the overall functionality of the game. Based on informal observations at our test sites, however, we are certain that children do pick up linguistic content from the songs during play, as many of the children repeat words and phrases from songs they heard. We will pick up upon these promising observations during future studies
  • Hammarström, H., & van den Heuvel, W. (Eds.). (2012). On the history, contact & classification of Papuan languages [Special Issue]. Language & Linguistics in Melanesia, 2012. Retrieved from http://www.langlxmelanesia.com/specialissues.htm.
  • Hanique, I., & Ernestus, M. (2012). The processes underlying two frequent casual speech phenomena in Dutch: A production experiment. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 2011-2014).

    Abstract

    This study investigated whether a shadowing task can provide insights in the nature of reduction processes that are typical of casual speech. We focused on the shortening and presence versus absence of schwa and /t/ in Dutch past participles. Results showed that the absence of these segments was affected by the same variables as their shortening, suggesting that absence mostly resulted from extreme gradient shortening. This contrasts with results based on recordings of spontaneous conversations. We hypothesize that this difference is due to non-casual fast speech elicited by a shadowing task.
  • Heilbron, M., Ehinger, B., Hagoort, P., & De Lange, F. P. (2019). Tracking naturalistic linguistic predictions with deep neural language models. In Proceedings of the 2019 Conference on Cognitive Computational Neuroscience (pp. 424-427). doi:10.32470/CCN.2019.1096-0.

    Abstract

    Prediction in language has traditionally been studied using
    simple designs in which neural responses to expected
    and unexpected words are compared in a categorical
    fashion. However, these designs have been contested
    as being ‘prediction encouraging’, potentially exaggerating
    the importance of prediction in language understanding.
    A few recent studies have begun to address
    these worries by using model-based approaches to probe
    the effects of linguistic predictability in naturalistic stimuli
    (e.g. continuous narrative). However, these studies
    so far only looked at very local forms of prediction, using
    models that take no more than the prior two words into
    account when computing a word’s predictability. Here,
    we extend this approach using a state-of-the-art neural
    language model that can take roughly 500 times longer
    linguistic contexts into account. Predictability estimates
    fromthe neural network offer amuch better fit to EEG data
    from subjects listening to naturalistic narrative than simpler
    models, and reveal strong surprise responses akin to
    the P200 and N400. These results show that predictability
    effects in language are not a side-effect of simple designs,
    and demonstrate the practical use of recent advances
    in AI for the cognitive neuroscience of language.
  • Holler, J., Schubotz, L., Kelly, S., Schuetze, M., Hagoort, P., & Ozyurek, A. (2013). Here's not looking at you, kid! Unaddressed recipients benefit from co-speech gestures when speech processing suffers. In M. Knauff, M. Pauen, I. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 2560-2565). Austin, TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0463/index.html.

    Abstract

    In human face-to-face communication, language comprehension is a multi-modal, situated activity. However, little is known about how we combine information from these different modalities, and how perceived communicative intentions, often signaled through visual signals, such as eye
    gaze, may influence this processing. We address this question by simulating a triadic communication context in which a
    speaker alternated her gaze between two different recipients. Participants thus viewed speech-only or speech+gesture
    object-related utterances when being addressed (direct gaze) or unaddressed (averted gaze). Two object images followed
    each message and participants’ task was to choose the object that matched the message. Unaddressed recipients responded significantly slower than addressees for speech-only
    utterances. However, perceiving the same speech accompanied by gestures sped them up to a level identical to
    that of addressees. That is, when speech processing suffers due to not being addressed, gesture processing remains intact and enhances the comprehension of a speaker’s message
  • Holler, J., Kelly, S., Hagoort, P., & Ozyurek, A. (2012). When gestures catch the eye: The influence of gaze direction on co-speech gesture comprehension in triadic communication. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 467-472). Austin, TX: Cognitive Society. Retrieved from http://mindmodeling.org/cogsci2012/papers/0092/index.html.

    Abstract

    Co-speech gestures are an integral part of human face-to-face communication, but little is known about how pragmatic factors influence our comprehension of those gestures. The present study investigates how different types of recipients process iconic gestures in a triadic communicative situation. Participants (N = 32) took on the role of one of two recipients in a triad and were presented with 160 video clips of an actor speaking, or speaking and gesturing. Crucially, the actor’s eye gaze was manipulated in that she alternated her gaze between the two recipients. Participants thus perceived some messages in the role of addressed recipient and some in the role of unaddressed recipient. In these roles, participants were asked to make judgements concerning the speaker’s messages. Their reaction times showed that unaddressed recipients did comprehend speaker’s gestures differently to addressees. The findings are discussed with respect to automatic and controlled processes involved in gesture comprehension.
  • Irvine, L., Roberts, S. G., & Kirby, S. (2013). A robustness approach to theory building: A case study of language evolution. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (CogSci 2013) (pp. 2614-2619). Retrieved from http://mindmodeling.org/cogsci2013/papers/0472/index.html.

    Abstract

    Models of cognitive processes often include simplifications, idealisations, and fictionalisations, so how should we learn about cognitive processes from such models? Particularly in cognitive science, when many features of the target system are unknown, it is not always clear which simplifications, idealisations, and so on, are appropriate for a research question, and which are highly misleading. Here we use a case-study from studies of language evolution, and ideas from philosophy of science, to illustrate a robustness approach to learning from models. Robust properties are those that arise across a range of models, simulations and experiments, and can be used to identify key causal structures in the models, and the phenomenon, under investigation. For example, in studies of language evolution, the emergence of compositional structure is a robust property across models, simulations and experiments of cultural transmission, but only under pressures for learnability and expressivity. This arguably illustrates the principles underlying real cases of language evolution. We provide an outline of the robustness approach, including its limitations, and suggest that this methodology can be productively used throughout cognitive science. Perhaps of most importance, it suggests that different modelling frameworks should be used as tools to identify the abstract properties of a system, rather than being definitive expressions of theories.
  • De Jong, N. H., & Bosker, H. R. (2013). Choosing a threshold for silent pauses to measure second language fluency. In R. Eklund (Ed.), Proceedings of the 6th Workshop on Disfluency in Spontaneous Speech (DiSS) (pp. 17-20).

    Abstract

    Second language (L2) research often involves analyses of acoustic measures of fluency. The studies investigating fluency, however, have been difficult to compare because the measures of fluency that were used differed widely. One of the differences between studies concerns the lower cut-off point for silent pauses, which has been set anywhere between 100 ms and 1000 ms. The goal of this paper is to find an optimal cut-off point. We calculate acoustic measures of fluency using different pause thresholds and then relate these measures to a measure of L2 proficiency and to ratings on fluency.
  • Joo, H., Jang, J., Kim, S., Cho, T., & Cutler, A. (2019). Prosodic structural effects on coarticulatory vowel nasalization in Australian English in comparison to American English. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 20195) (pp. 835-839). Canberra, Australia: Australasian Speech Science and Technology Association Inc.

    Abstract

    This study investigates effects of prosodic factors (prominence, boundary) on coarticulatory Vnasalization in Australian English (AusE) in CVN and NVC in comparison to those in American English
    (AmE). As in AmE, prominence was found to
    lengthen N, but to reduce V-nasalization, enhancing N’s nasality and V’s orality, respectively (paradigmatic contrast enhancement). But the prominence effect in CVN was more robust than that in AmE. Again similar to findings in AmE, boundary
    induced a reduction of N-duration and V-nasalization phrase-initially (syntagmatic contrast enhancement), and increased the nasality of both C and V phrasefinally.
    But AusE showed some differences in terms
    of the magnitude of V nasalization and N duration. The results suggest that the linguistic contrast enhancements underlie prosodic-structure modulation of coarticulatory V-nasalization in
    comparable ways across dialects, while the fine phonetic detail indicates that the phonetics-prosody interplay is internalized in the individual dialect’s phonetic grammar.
  • Kempen, G., & Harbusch, K. (1998). A 'tree adjoining' grammar without adjoining: The case of scrambling in German. In Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4).
  • Khetarpal, N., Neveu, G., Majid, A., Michael, L., & Regier, T. (2013). Spatial terms across languages support near-optimal communication: Evidence from Peruvian Amazonia, and computational analyses. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (pp. 764-769). Austin, TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0158/index.html.

    Abstract

    Why do languages have the categories they do? It has been argued that spatial terms in the world’s languages reflect categories that support highly informative communication, and that this accounts for the spatial categories found across languages. However, this proposal has been tested against only nine languages, and in a limited fashion. Here, we consider two new languages: Maijɨki, an under-documented language of Peruvian Amazonia, and English. We analyze spatial data from these two new languages and the original nine, using thorough and theoretically targeted computational tests. The results support the hypothesis that spatial terms across dissimilar languages enable near-optimally informative communication, over an influential competing hypothesis
  • Kita, S., van Gijn, I., & van der Hulst, H. (1998). Movement phases in signs and co-speech gestures, and their transcription by human coders. In Gesture and Sign-Language in Human-Computer Interaction (Lecture Notes in Artificial Intelligence - LNCS Subseries, Vol. 1371) (pp. 23-35). Berlin, Germany: Springer-Verlag.

    Abstract

    The previous literature has suggested that the hand movement in co-speech gestures and signs consists of a series of phases with qualitatively different dynamic characteristics. In this paper, we propose a syntagmatic rule system for movement phases that applies to both co-speech gestures and signs. Descriptive criteria for the rule system were developed for the analysis video-recorded continuous production of signs and gesture. It involves segmenting a stream of body movement into phases and identifying different phase types. Two human coders used the criteria to analyze signs and cospeech gestures that are produced in natural discourse. It was found that the criteria yielded good inter-coder reliability. These criteria can be used for the technology of automatic recognition of signs and co-speech gestures in order to segment continuous production and identify the potentially meaningbearing phase.
  • Klein, W. (2013). L'effettivo declino e la crescita potenziale della lessicografia tedesca. In N. Maraschio, D. De Martiono, & G. Stanchina (Eds.), L'italiano dei vocabolari: Atti di La piazza delle lingue 2012 (pp. 11-20). Firenze: Accademia della Crusca.
  • Klein, W. (Ed.). (1998). Kaleidoskop [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (112).
  • Klein, W. (Ed.). (1987). Sprache und Ritual [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (65).
  • Klein, W. (Ed.). (1986). Sprachverfall [Special Issue]. Zeitschrift für Literaturwissenschaft und Linguistik, (62).
  • Lenkiewicz, A., & Drude, S. (2013). Automatic annotation of linguistic 2D and Kinect recordings with the Media Query Language for Elan. In Proceedings of Digital Humanities 2013 (pp. 276-278).

    Abstract

    Research in body language with use of gesture recognition and speech analysis has gained much attention in the recent times, influencing disciplines related to image and speech processing.

    This study aims to design the Media Query Language (MQL) (Lenkiewicz, et al. 2012) combined with the Linguistic Media Query Interface (LMQI) for Elan (Wittenburg, et al. 2006). The system integrated with the new achievements in audio-video recognition will allow querying media files with predefined gesture phases (or motion primitives) and speech characteristics as well as combinations of both. For the purpose of this work the predefined motions and speech characteristics are called patterns for atomic elements and actions for a sequence of patterns. The main assumption is that a user-customized library of patterns and actions and automated media annotation with LMQI will reduce annotation time, hence decreasing costs of creation of annotated corpora. Increase of the number of annotated data should influence the speed and number of possible research in disciplines in which human multimodal interaction is a subject of interest and where annotated corpora are required.
  • Lenkiewicz, P., Auer, E., Schreer, O., Masneri, S., Schneider, D., & Tschöpe, S. (2012). AVATecH ― automated annotation through audio and video analysis. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 209-214). European Language Resources Association.

    Abstract

    In different fields of the humanities annotations of multimodal resources are a necessary component of the research workflow. Examples include linguistics, psychology, anthropology, etc. However, creation of those annotations is a very laborious task, which can take 50 to 100 times the length of the annotated media, or more. This can be significantly improved by applying innovative audio and video processing algorithms, which analyze the recordings and provide automated annotations. This is the aim of the AVATecH project, which is a collaboration of the Max Planck Institute for Psycholinguistics (MPI) and the Fraunhofer institutes HHI and IAIS. In this paper we present a set of results of automated annotation together with an evaluation of their quality.
  • Lenkiewicz, A., Lis, M., & Lenkiewicz, P. (2012). Linguistic concepts described with Media Query Language for automated annotation. In J. C. Meiser (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 477-479).

    Abstract

    Introduction Human spoken communication is multimodal, i.e. it encompasses both speech and gesture. Acoustic properties of voice, body movements, facial expression, etc. are an inherent and meaningful part of spoken interaction; they can provide attitudinal, grammatical and semantic information. In the recent years interest in audio-visual corpora has been rising rapidly as they enable investigation of different communicative modalities and provide more holistic view on communication (Kipp et al. 2009). Moreover, for some languages such corpora are the only available resource, as is the case for endangered languages for which no written resources exist.
  • Lenkiewicz, P., Van Uytvanck, D., Wittenburg, P., & Drude, S. (2012). Towards automated annotation of audio and video recordings by application of advanced web-services. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 1880-1883).

    Abstract

    In this paper we describe audio and video processing algorithms that are developed in the scope of AVATecH project. The purpose of these algorithms is to shorten the time taken by manual annotation of audio and video recordings by extracting features from media files and creating semi-automated annotations. We show that the use of such supporting algorithms can shorten the annotation time to 30-50% of the time necessary to perform a fully manual annotation of the same kind.
  • Levelt, W. J. M. (1991). Lexical access in speech production: Stages versus cascading. In H. Peters, W. Hulstijn, & C. Starkweather (Eds.), Speech motor control and stuttering (pp. 3-10). Amsterdam: Excerpta Medica.
  • Levelt, W. J. M., & Schriefers, H. (1987). Stages of lexical access. In G. A. Kempen (Ed.), Natural language generation: new results in artificial intelligence, psychology and linguistics (pp. 395-404). Dordrecht: Nijhoff.
  • Levinson, S. C. (1987). Minimization and conversational inference. In M. Bertuccelli Papi, & J. Verschueren (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference (pp. 61-129). Benjamins.
  • Liu, S., & Zhang, Y. (2019). Why some verbs are harder to learn than others – A micro-level analysis of everyday learning contexts for early verb learning. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci 2019) (pp. 2173-2178). Montreal, QB: Cognitive Science Society.

    Abstract

    Verb learning is important for young children. While most
    previous research has focused on linguistic and conceptual
    challenges in early verb learning (e.g. Gentner, 1982, 2006),
    the present paper examined early verb learning at the
    attentional level and quantified the input for early verb learning
    by measuring verb-action co-occurrence statistics in parent-
    child interaction from the learner’s perspective. To do so, we
    used head-mounted eye tracking to record fine-grained
    multimodal behaviors during parent-infant joint play, and
    analyzed parent speech, parent and infant action, and infant
    attention at the moments when parents produced verb labels.
    Our results show great variability across different action verbs,
    in terms of frequency of verb utterances, frequency of
    corresponding actions related to verb meanings, and infants’
    attention to verbs and actions, which provide new insights on
    why some verbs are harder to learn than others.
  • Mai, F., Galke, L., & Scherp, A. (2019). CBOW is not all you need: Combining CBOW with the compositional matrix space model. In Proceedings of the Seventh International Conference on Learning Representations (ICLR 2019). OpenReview.net.

    Abstract

    Continuous Bag of Words (CBOW) is a powerful text embedding method. Due to its strong capabilities to encode word content, CBOW embeddings perform well on a wide range of downstream tasks while being efficient to compute. However, CBOW is not capable of capturing the word order. The reason is that the computation of CBOW's word embeddings is commutative, i.e., embeddings of XYZ and ZYX are the same. In order to address this shortcoming, we propose a
    learning algorithm for the Continuous Matrix Space Model, which we call Continual Multiplication of Words (CMOW). Our algorithm is an adaptation of word2vec, so that it can be trained on large quantities of unlabeled text. We empirically show that CMOW better captures linguistic properties, but it is inferior to CBOW in memorizing word content. Motivated by these findings, we propose a hybrid model that combines the strengths of CBOW and CMOW. Our results show that the hybrid CBOW-CMOW-model retains CBOW's strong ability to memorize word content while at the same time substantially improving its ability to encode other linguistic information by 8%. As a result, the hybrid also performs better on 8 out of 11 supervised downstream tasks with an average improvement of 1.2%.
  • Majid, A. (2013). Olfactory language and cognition. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual meeting of the Cognitive Science Society (CogSci 2013) (pp. 68). Austin,TX: Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2013/papers/0025/index.html.

    Abstract

    Since the cognitive revolution, a widely held assumption has been that—whereas content may vary across cultures—cognitive processes would be universal, especially those on the more basic levels. Even if scholars do not fully subscribe to this assumption, they often conceptualize, or tend to investigate, cognition as if it were universal (Henrich, Heine, & Norenzayan, 2010). The insight that universality must not be presupposed but scrutinized is now gaining ground, and cognitive diversity has become one of the hot (and controversial) topics in the field (Norenzayan & Heine, 2005). We argue that, for scrutinizing the cultural dimension of cognition, taking an anthropological perspective is invaluable, not only for the task itself, but for attenuating the home-field disadvantages that are inescapably linked to cross-cultural research (Medin, Bennis, & Chandler, 2010).
  • Majid, A. (2012). Taste in twenty cultures [Abstract]. Abstracts from the XXIth Congress of European Chemoreception Research Organization, ECRO-2011. Publ. in Chemical Senses, 37(3), A10.

    Abstract

    Scholars disagree about the extent to which language can tell us
    about conceptualisation of the world. Some believe that language
    is a direct window onto concepts: Having a word ‘‘bird’’, ‘‘table’’ or
    ‘‘sour’’ presupposes the corresponding underlying concept, BIRD,
    TABLE, SOUR. Others disagree. Words are thought to be uninformative,
    or worse, misleading about our underlying conceptual representations;
    after all, our mental worlds are full of ideas that we
    struggle to express in language. How could this be so, argue sceptics,
    if language were a direct window on our inner life? In this presentation,
    I consider what language can tell us about the
    conceptualisation of taste. By considering linguistic data from
    twenty unrelated cultures – varying in subsistence mode (huntergatherer
    to industrial), ecological zone (rainforest jungle to desert),
    dwelling type (rural and urban), and so forth – I argue any single language is, indeed, impoverished about what it can reveal about
    taste. But recurrent lexicalisation patterns across languages can
    provide valuable insights about human taste experience. Moreover,
    language patterning is part of the data that a good theory of taste
    perception has to be answerable for. Taste researchers, therefore,
    cannot ignore the crosslinguistic facts.

Share this page