Publications

Displaying 101 - 117 of 117
  • Timmer, K., Ganushchak, L. Y., Mitlina, Y., & Schiller, N. O. (2013). Choosing first or second language phonology in 125 ms [Abstract]. Journal of Cognitive Neuroscience, 25 Suppl., 164.

    Abstract

    We are often in a bilingual situation (e.g., overhearing a conversation in the train). We investigated whether first (L1) and second language (L2) phonologies are automatically activated. A masked priming paradigm was used, with Russian words as targets and either Russian or English words as primes. Event-related potentials (ERPs) were recorded while Russian (L1) – English (L2) bilinguals read aloud L1 target words (e.g. РЕЙС /reis/ ‘fl ight’) primed with either L1 (e.g. РАНА /rana/ ‘wound’) or L2 words (e.g. PACK). Target words were read faster when they were preceded by phonologically related L1 primes but not by orthographically related L2 primes. ERPs showed orthographic priming in the 125-200 ms time window. Thus, both L1 and L2 phonologies are simultaneously activated during L1 reading. The results provide support for non-selective models of bilingual reading, which assume automatic activation of the non-target language phonology even when it is not required by the task.
  • Turco, G., & Gubian, M. (2012). L1 Prosodic transfer and priming effects: A quantitative study on semi-spontaneous dialogues. In Q. Ma, H. Ding, & D. Hirst (Eds.), Proceedings of the 6th International Conference on Speech Prosody (pp. 386-389). International Speech Communication Association (ISCA).

    Abstract

    This paper represents a pilot investigation of primed accentuation patterns produced by advanced Dutch speakers of Italian as a second language (L2). Contrastive accent patterns within prepositional phrases were elicited in a semispontaneous dialogue entertained with a confederate native speaker of Italian. The aim of the analysis was to compare learner’s contrastive accentual configurations induced by the confederate speaker’s prime against those produced by Italian and Dutch natives in the same testing conditions. F0 and speech rate data were analysed by applying powerful datadriven techniques available in the Functional Data Analysis statistical framework. Results reveal different accentual configurations in L1 and L2 Italian in response to the confederate’s prime. We conclude that learner’s accentual patterns mirror those ones produced by their L1 control group (prosodic-transfer hypothesis) although the hypothesis of a transient priming effect on learners’ choice of contrastive patterns cannot be completely ruled out.
  • Ünal, E., & Papafragou, A. (2013). Linguistic and conceptual representations of inference as a knowledge source. In S. Baiz, N. Goldman, & R. Hawkes (Eds.), Proceedings of the 37th Annual Boston University Conference on Language Development (BUCLD 37) (pp. 433-443). Boston: Cascadilla Press.
  • Van Uytvanck, D., Stehouwer, H., & Lampen, L. (2012). Semantic metadata mapping in practice: The Virtual Language Observatory. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 1029-1034). European Language Resources Association (ELRA).

    Abstract

    In this paper we present the Virtual Language Observatory (VLO), a metadata-based portal for language resources. It is completely based on the Component Metadata (CMDI) and ISOcat standards. This approach allows for the use of heterogeneous metadata schemas while maintaining the semantic compatibility. We describe the metadata harvesting process, based on OAI-PMH, and the conversion from several formats (OLAC, IMDI and the CLARIN LRT inventory) to their CMDI counterpart profiles. Then we focus on some post-processing steps to polish the harvested records. Next, the ingestion of the CMDI files into the VLO facet browser is described. We also include an overview of the changes since the first version of the VLO, based on user feedback from the CLARIN community. Finally there is an overview of additional ideas and improvements for future versions of the VLO.
  • Van Putten, S. (2013). The meaning of the Avatime additive particle tsye. In M. Balbach, L. Benz, S. Genzel, M. Grubic, A. Renans, S. Schalowski, M. Stegenwallner, & A. Zeldes (Eds.), Information structure: Empirical perspectives on theory (pp. 55-74). Potsdam: Universitätsverlag Potsdam. Retrieved from http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:kobv:517-opus-64804.
  • Viebahn, M. C., Ernestus, M., & McQueen, J. M. (2012). Co-occurrence of reduced word forms in natural speech. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 2019-2022).

    Abstract

    This paper presents a corpus study that investigates the co-occurrence of reduced word forms in natural speech. We extracted Dutch past participles from three different speech registers and investigated the influence of several predictor variables on the presence and duration of schwas in prefixes and /t/s in suffixes. Our results suggest that reduced word forms tend to co-occur even if we partial out the effect of speech rate. The implications of our findings for episodic and abstractionist models of lexical representation are discussed.
  • von Stutterheim, C., & Flecken, M. (Eds.). (2013). Principles of information organization in L2 discourse [Special Issue]. International Review of Applied linguistics in Language Teaching (IRAL), 51(2).
  • Warner, N. L., McQueen, J. M., Liu, P. Z., Hoffmann, M., & Cutler, A. (2012). Timing of perception for all English diphones [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1967.

    Abstract

    Information in speech does not unfold discretely over time; perceptual cues are gradient and overlapped. However, this varies greatly across segments and environments: listeners cannot identify the affricate in /ptS/ until the frication, but information about the vowel in /li/ begins early. Unlike most prior studies, which have concentrated on subsets of language sounds, this study tests perception of every English segment in every phonetic environment, sampling perceptual identification at six points in time (13,470 stimuli/listener; 20 listeners). Results show that information about consonants after another segment is most localized for affricates (almost entirely in the release), and most gradual for voiced stops. In comparison to stressed vowels, unstressed vowels have less information spreading to
    neighboring segments and are less well identified. Indeed, many vowels,
    especially lax ones, are poorly identified even by the end of the following segment. This may partly reflect listeners’ familiarity with English vowels’ dialectal variability. Diphthongs and diphthongal tense vowels show the most sudden improvement in identification, similar to affricates among the consonants, suggesting that information about segments defined by acoustic change is highly localized. This large dataset provides insights into speech perception and data for probabilistic modeling of spoken word recognition.
  • Windhouwer, M., Broeder, D., & Van Uytvanck, D. (2012). A CMD core model for CLARIN web services. In Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 41-48).

    Abstract

    In the CLARIN infrastructure various national projects have started initiatives to allow users of the infrastructure to create chains or workflows of web services. The Component Metadata (CMD) core model for web services described in this paper tries to align the metadata descriptions of these various initiatives. This should allow chaining/workflow engines to find matching and invoke services. The paper describes the landscape of web services architectures and the state of the national initiatives. Based on this a CMD core model for CLARIN is proposed, which, within some limits, can be adapted to the specific needs of an initiative by the standard facilities of CMD. The paper closes with the current state and usage of the model and a look into the future.
  • Windhouwer, M. (2012). RELcat: a Relation Registry for ISOcat data categories. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 3661-3664). European Language Resources Association (ELRA).

    Abstract

    The ISOcat Data Category Registry contains basically a flat and easily extensible list of data category specifications. To foster reuse and standardization only very shallow relationships among data categories are stored in the registry. However, to assist crosswalks, possibly based on personal views, between various (application) domains and to overcome possible proliferation of data categories more types of ontological relationships need to be specified. RELcat is a first prototype of a Relation Registry, which allows storing arbitrary relationships. These relationships can reflect the personal view of one linguist or a larger community. The basis of the registry is a relation type taxonomy that can easily be extended. This allows on one hand to load existing sets of relations specified in, for example, an OWL (2) ontology or SKOS taxonomy. And on the other hand allows algorithms that query the registry to traverse the stored semantic network to remain ignorant of the original source vocabulary. This paper describes first experiences with RELcat and explains some initial design decisions.
  • Windhouwer, M. (2012). Towards standardized descriptions of linguistic features: ISOcat and procedures for using common data categories. In J. Jancsary (Ed.), Proceedings of the Conference on Natural Language Processing 2012, (SFLR 2012 workshop), September 19-21, 2012, Vienna (pp. 494). Vienna: Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI).

    Abstract

    Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams.
  • Withers, P. (2012). Metadata management with Arbil. In V. Arranz, D. Broeder, B. Gaiffe, M. Gavrilidou, & M. Monachini (Eds.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 72-75). European Language Resources Association (ELRA).

    Abstract

    Arbil is an application designed to create and manage metadata for research data and to arrange this data into a structure appropriate for archiving. The metadata is displayed in tables, which allows an overview of the metadata and the ability to populate and update many metadata sections in bulk. Both IMDI and Clarin metadata formats are supported and Arbil has been designed as a local application so that it can also be used offline, for instance in remote field sites. The metadata can be entered in any order or at any stage that the user is able; once the metadata and its data are ready for archiving and an Internet connection is available it can be exported from Arbil and in the case of IMDI it can then be transferred to the main archive via LAMUS (archive management and upload system).
  • Wittenburg, P., Lenkiewicz, P., Auer, E., Gebre, B. G., Lenkiewicz, A., & Drude, S. (2012). AV Processing in eHumanities - a paradigm shift. In J. C. Meister (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 538-541).

    Abstract

    Introduction Speech research saw a dramatic change in paradigm in the 90-ies. While earlier the discussion was dominated by a phoneticians’ approach who knew about phenomena in the speech signal, the situation completely changed after stochastic machinery such as Hidden Markov Models [1] and Artificial Neural Networks [2] had been introduced. Speech processing was now dominated by a purely mathematic approach that basically ignored all existing knowledge about the speech production process and the perception mechanisms. The key was now to construct a large enough training set that would allow identifying the many free parameters of such stochastic engines. In case that the training set is representative and the annotations of the training sets are widely ‘correct’ we could assume to get a satisfyingly functioning recognizer. While the success of knowledge-based systems such as Hearsay II [3] was limited, the statistically based approach led to great improvements in recognition rates and to industrial applications.
  • Wnuk, E., & Majid, A. (2012). Olfaction in a hunter-gatherer society: Insights from language and culture. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 1155-1160). Austin, TX: Cognitive Science Society.

    Abstract

    According to a widely-held view among various scholars, olfaction is inferior to other human senses. It is also believed by many that languages do not have words for describing smells. Data collected among the Maniq, a small population of nomadic foragers in southern Thailand, challenge the above claims and point to a great linguistic and cultural elaboration of odor. This article presents evidence of the importance of olfaction in indigenous rituals and beliefs, as well as in the lexicon. The results demonstrate the richness and complexity of the domain of smell in Maniq society and thereby challenge the universal paucity of olfactory terms and insignificance of olfaction for humans.
  • Zampieri, M., & Gebre, B. G. (2012). Automatic identification of language varieties: The case of Portuguese. In J. Jancsary (Ed.), Proceedings of the Conference on Natural Language Processing 2012, September 19-21, 2012, Vienna (pp. 233-237). Vienna: Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI).

    Abstract

    Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams.
  • Zampieri, M., Gebre, B. G., & Diwersy, S. (2012). Classifying pluricentric languages: Extending the monolingual model. In Proceedings of SLTC 2012. The Fourth Swedish Language Technology Conference. Lund, October 24-26, 2012 (pp. 79-80). Lund University.

    Abstract

    This study presents a new language identification model for pluricentric languages that uses n-gram language models at the character and word level. The model is evaluated in two steps. The first step consists of the identification of two varieties of Spanish (Argentina and Spain) and two varieties of French (Quebec and France) evaluated independently in binary classification schemes. The second step integrates these language models in a six-class classification with two Portuguese varieties.
  • De Zubicaray, G. I., Acheson, D. J., & Hartsuiker, R. J. (Eds.). (2013). Mind what you say - general and specific mechanisms for monitoring in speech production [Research topic] [Special Issue]. Frontiers in Human Neuroscience. Retrieved from http://www.frontiersin.org/human_neuroscience/researchtopics/mind_what_you_say_-_general_an/1197.

    Abstract

    Psycholinguistic research has typically portrayed speech production as a relatively automatic process. This is because when errors are made, they occur as seldom as one in every thousand words we utter. However, it has long been recognised that we need some form of control over what we are currently saying and what we plan to say. This capacity to both monitor our inner speech and self-correct our speech output has often been assumed to be a property of the language comprehension system. More recently, it has been demonstrated that speech production benefits from interfacing with more general cognitive processes such as selective attention, short-term memory (STM) and online response monitoring to resolve potential conflict and successfully produce the output of a verbal plan. The conditions and levels of representation according to which these more general planning, monitoring and control processes are engaged during speech production remain poorly understood. Moreover, there remains a paucity of information about their neural substrates, despite some of the first evidence of more general monitoring having come from electrophysiological studies of error related negativities (ERNs). While aphasic speech errors continue to be a rich source of information, there has been comparatively little research focus on instances of speech repair. The purpose of this Frontiers Research Topic is to provide a forum for researchers to contribute investigations employing behavioural, neuropsychological, electrophysiological, neuroimaging and virtual lesioning techniques. In addition, while the focus of the research topic is on novel findings, we welcome submission of computational simulations, review articles and methods papers.

Share this page