Publications

Displaying 201 - 236 of 236
  • Terrill, A. (2003). A grammar of Lavukaleve. Berlin: Mouton de Gruyter.
  • Tice, M., & Henetz, T. (2011). Turn-boundary projection: Looking ahead. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 838-843). Austin, TX: Cognitive Science Society.

    Abstract

    Coordinating with others is hard; and yet we accomplish this every day when we take turns in a conversation. How do we do this? The present study introduces a new method of measuring turn-boundary projection that enables researchers to achieve more valid, flexible, and temporally informative data on online turn projection: tracking an observer’s gaze from the current speaker to the next speaker. In this preliminary investigation, participants consistently looked at the current speaker during their turn. Additionally, they looked to the next speaker before her turn began, and sometimes even before the current speaker finished speaking. This suggests that observer gaze is closely aligned with perceptual processes of turn-boundary projection, and thus may equip the field with the tools to explore how we manage to take turns.
  • Tschöpel, S., Schneider, D., Bardeli, R., Schreer, O., Masneri, S., Wittenburg, P., Sloetjes, H., Lenkiewicz, P., & Auer, E. (2011). AVATecH: Audio/Video technology for humanities research. In C. Vertan, M. Slavcheva, P. Osenova, & S. Piperidis (Eds.), Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, Hissar, Bulgaria, 16 September 2011 (pp. 86-89). Shoumen, Bulgaria: Incoma Ltd.

    Abstract

    In the AVATecH project the Max-Planck Institute for Psycholinguistics (MPI) and the Fraunhofer institutes HHI and IAIS aim to significantly speed up the process of creating annotations of audio-visual data for humanities research. For this we integrate state-of-theart audio and video pattern recognition algorithms into the widely used ELAN annotation tool. To address the problem of heterogeneous annotation tasks and recordings we provide modular components extended by adaptation and feedback mechanisms to achieve competitive annotation quality within significantly less annotation time. Currently we are designing a large-scale end-user evaluation of the project.
  • Tuinman, A., Mitterer, H., & Cutler, A. (2011). The efficiency of cross-dialectal word recognition. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy (pp. 153-156).

    Abstract

    Dialects of the same language can differ in the casual speech processes they allow; e.g., British English allows the insertion of [r] at word boundaries in sequences such as saw ice, while American English does not. In two speeded word recognition experiments, American listeners heard such British English sequences; in contrast to non-native listeners, they accurately perceived intended vowel-initial words even with intrusive [r]. Thus despite input mismatches, cross-dialectal word recognition benefits from the full power of native-language processing.
  • Turco, G., Gubian, M., & Schertz, J. (2011). A quantitative investigation of the prosody of Verum Focus in Italian. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy (pp. 961-964).

    Abstract

    prosodic marking of Verum focus (VF) in Italian, which is said to be realized with a pitch accent on the finite verb (e.g. A: Paul has not eaten the banana - B: (No), Paul HAS eaten the banana!). We tried to discover whether and how Italian speakers prosodically mark VF when producing full-fledged sentences using a semi-spontaneous production experiment on 27 speakers. Speech rate and f0 contours were extracted using automatic data processing tools and were subsequently analysed using Functional Data Analysis (FDA), which allowed for automatic visualization of patterns in the contour shapes. Our results show that the postfocal region of VF sentences exhibit faster speech rate and lower f0 compared to non-VF cases. However, an expected consistent difference of f0 effect on the focal region of the VF sentence was not found in this analysis.
  • Turco, G., & Gubian, M. (2012). L1 Prosodic transfer and priming effects: A quantitative study on semi-spontaneous dialogues. In Q. Ma, H. Ding, & D. Hirst (Eds.), Proceedings of the 6th International Conference on Speech Prosody (pp. 386-389). International Speech Communication Association (ISCA).

    Abstract

    This paper represents a pilot investigation of primed accentuation patterns produced by advanced Dutch speakers of Italian as a second language (L2). Contrastive accent patterns within prepositional phrases were elicited in a semispontaneous dialogue entertained with a confederate native speaker of Italian. The aim of the analysis was to compare learner’s contrastive accentual configurations induced by the confederate speaker’s prime against those produced by Italian and Dutch natives in the same testing conditions. F0 and speech rate data were analysed by applying powerful datadriven techniques available in the Functional Data Analysis statistical framework. Results reveal different accentual configurations in L1 and L2 Italian in response to the confederate’s prime. We conclude that learner’s accentual patterns mirror those ones produced by their L1 control group (prosodic-transfer hypothesis) although the hypothesis of a transient priming effect on learners’ choice of contrastive patterns cannot be completely ruled out.
  • Van Hout, A., Veenstra, A., & Berends, S. (2011). All pronouns are not acquired equally in Dutch: Elicitation of object and quantitative pronouns. In M. Pirvulescu, M. C. Cuervo, A. T. Pérez-Leroux, J. Steele, & N. Strik (Eds.), Selected proceedings of the 4th Conference on Generative Approaches to Language Acquisition North America (GALANA 2010) (pp. 106-121). Somerville, MA: Cascadilla Proceedings Project.

    Abstract

    This research reports the results of eliciting pronouns in two syntactic environments: Object pronouns and quantitative er (Q-er). Thus another type of language is added to the literature on subject and object clitic acquisition in the Romance languages (Jakubowicz et al., 1998; Hamann et al., 1996). Quantitative er is a unique pronoun in the Germanic languages; it has the same distribution as partitive clitics in Romance. Q-er is an N'-anaphor and occurs obligatorily with headless noun phrases with a numeral or weak quantifier. Q-er is licensed only when the context offers an antecedent; it binds an empty position in the NP. Data from typically-developing children aged 5;0-6;0 show that object and Q-er pronouns are not acquired equally; it is proposed that this is due to their different syntax. The use of Q-er involves more sophisticated syntactic knowledge: Q-er occurs at the left edge of the VP and binds an empty position in the NP, whereas object pronouns are simply stand-ins for full NPs and occur in the same position. These Dutch data reveal that pronouns are not used as exclusively as object clitics are in the Romance languages (Varlakosta, in prep.).
  • Van Ooijen, B., Cutler, A., & Berinetto, P. M. (1993). Click detection in Italian and English. In Eurospeech 93: Vol. 1 (pp. 681-684). Berlin: ESCA.

    Abstract

    We report four experiments in which English and Italian monolinguals detected clicks in continous speech in their native language. Two of the experiments used an off-line location task, and two used an on-line reaction time task. Despite there being large differences between English and Italian with respect to rhythmic characteristics, very similar response patterns were found for the two language groups. It is concluded that the process of click detection operates independently from language-specific differences in perceptual processing at the sublexical level.
  • Van Gijn, R., Haude, K., & Muysken, P. (Eds.). (2011). Subordination in native South American languages. Amsterdam: Benjamins.

    Abstract

    In terms of its linguistic and cultural make-up, the continent of South America provides linguists and anthropologists with a complex puzzle of language diversity. The continent teems with small language families and isolates, and even languages spoken in adjacent areas can be typologically vastly different from each other. This volume intends to provide a taste of the linguistic diversity found in South America within the area of clause subordination. The potential variety in the strategies that languages can use to encode subordinate events is enormous, yet there are clearly dominant patterns to be discerned: switch reference marking, clause chaining, nominalization, and verb serialization. The book also contributes to the continuing debate on the nature of syntactic complexity, as evidenced in subordination.
  • Van Uytvanck, D., Stehouwer, H., & Lampen, L. (2012). Semantic metadata mapping in practice: The Virtual Language Observatory. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 1029-1034). European Language Resources Association (ELRA).

    Abstract

    In this paper we present the Virtual Language Observatory (VLO), a metadata-based portal for language resources. It is completely based on the Component Metadata (CMDI) and ISOcat standards. This approach allows for the use of heterogeneous metadata schemas while maintaining the semantic compatibility. We describe the metadata harvesting process, based on OAI-PMH, and the conversion from several formats (OLAC, IMDI and the CLARIN LRT inventory) to their CMDI counterpart profiles. Then we focus on some post-processing steps to polish the harvested records. Next, the ingestion of the CMDI files into the VLO facet browser is described. We also include an overview of the changes since the first version of the VLO, based on user feedback from the CLARIN community. Finally there is an overview of additional ideas and improvements for future versions of the VLO.
  • Van Berkum, J. J. A. (2011). Zonder gevoel geen taal [Inaugural lecture].

    Abstract

    Onderzoek naar taal en communicatie heeft zich in het verleden veel te veel gericht op taal als systeem om berichten te coderen, een soort TCP/IP (netwerkprotocol voor communicatie tussen computers). Dat moet maar eens veranderen, stelt prof. dr. Jos van Berkum, hoogleraar Discourse, Cognitie en Communicatie, in zijn oratie die hij op 30 september zal houden aan de Universiteit Utrecht. Hij pleit voor meer onderzoek naar de sterke verwevenheid van taal en gevoel.
  • Vapnarsky, V., & Le Guen, O. (2011). The guardians of space: Understanding ecological and historical relations of the contemporary Yucatec Mayas to their landscape. In C. Isendahl, & B. Liljefors Persson (Eds.), Ecology, Power, and Religion in Maya Landscapes: Proceedings of the 11th European Maya Conference. Acta Mesoamericano. vol. 23. Markt Schwaben: Saurwein.
  • Versteegh, M., Ten Bosch, L., & Boves, L. (2011). Modelling novelty preference in word learning. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy (pp. 761-764).

    Abstract

    This paper investigates the effects of novel words on a cognitively plausible computational model of word learning. The model is first familiarized with a set of words, achieving high recognition scores and subsequently offered novel words for training. We show that the model is able to recognize the novel words as different from the previously seen words, based on a measure of novelty that we introduce. We then propose a procedure analogous to novelty preference in infants. Results from simulations of word learning show that adding this procedure to our model speeds up training and helps the model attain higher recognition rates.
  • Verweij, H., Windhouwer, M., & Wittenburg, P. (2011). Knowledge management for small languages. In V. Luzar-Stiffler, I. Jarec, & Z. Bekic (Eds.), Proceedings of the ITI 2011 33rd Int. Conf. on Information Technology Interfaces, June 27-30, 2011, Cavtat, Croatia (pp. 213-218). Zagreb, Croatia: University Computing Centre, University of Zagreb.

    Abstract

    In this paper an overview of the knowledge components needed for extensive documentation of small languages is given. The Language Archive is striving to offer all these tools to the linguistic community. The major tools in relation to the knowledge components are described. Followed by a discussion on what is currently lacking and possible strategies to move forward.
  • Viebahn, M. C., Ernestus, M., & McQueen, J. M. (2012). Co-occurrence of reduced word forms in natural speech. In Proceedings of INTERSPEECH 2012: 13th Annual Conference of the International Speech Communication Association (pp. 2019-2022).

    Abstract

    This paper presents a corpus study that investigates the co-occurrence of reduced word forms in natural speech. We extracted Dutch past participles from three different speech registers and investigated the influence of several predictor variables on the presence and duration of schwas in prefixes and /t/s in suffixes. Our results suggest that reduced word forms tend to co-occur even if we partial out the effect of speech rate. The implications of our findings for episodic and abstractionist models of lexical representation are discussed.
  • Vuong, L., Meyer, A. S., & Christiansen, M. H. (2011). Simultaneous online tracking of adjacent and non-adjacent dependencies in statistical learning. In L. Carlson, C. Hölscher, & T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 964-969). Austin, TX: Cognitive Science Society.
  • Wagner, A., & Braun, A. (2003). Is voice quality language-dependent? Acoustic analyses based on speakers of three different languages. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 651-654). Adelaide: Causal Productions.
  • Wagner, M., Tran, D., Togneri, R., Rose, P., Powers, D., Onslow, M., Loakes, D., Lewis, T., Kuratate, T., Kinoshita, Y., Kemp, N., Ishihara, S., Ingram, J., Hajek, J., Grayden, D., Göcke, R., Fletcher, J., Estival, D., Epps, J., Dale, R. and 11 moreWagner, M., Tran, D., Togneri, R., Rose, P., Powers, D., Onslow, M., Loakes, D., Lewis, T., Kuratate, T., Kinoshita, Y., Kemp, N., Ishihara, S., Ingram, J., Hajek, J., Grayden, D., Göcke, R., Fletcher, J., Estival, D., Epps, J., Dale, R., Cutler, A., Cox, F., Chetty, G., Cassidy, S., Butcher, A., Burnham, D., Bird, S., Best, C., Bennamoun, M., Arciuli, J., & Ambikairajah, E. (2011). The Big Australian Speech Corpus (The Big ASC). In M. Tabain, J. Fletcher, D. Grayden, J. Hajek, & A. Butcher (Eds.), Proceedings of the Thirteenth Australasian International Conference on Speech Science and Technology (pp. 166-170). Melbourne: ASSTA.
  • Warner, N. L., McQueen, J. M., Liu, P. Z., Hoffmann, M., & Cutler, A. (2012). Timing of perception for all English diphones [Abstract]. Program abstracts from the 164th Meeting of the Acoustical Society of America published in the Journal of the Acoustical Society of America, 132(3), 1967.

    Abstract

    Information in speech does not unfold discretely over time; perceptual cues are gradient and overlapped. However, this varies greatly across segments and environments: listeners cannot identify the affricate in /ptS/ until the frication, but information about the vowel in /li/ begins early. Unlike most prior studies, which have concentrated on subsets of language sounds, this study tests perception of every English segment in every phonetic environment, sampling perceptual identification at six points in time (13,470 stimuli/listener; 20 listeners). Results show that information about consonants after another segment is most localized for affricates (almost entirely in the release), and most gradual for voiced stops. In comparison to stressed vowels, unstressed vowels have less information spreading to
    neighboring segments and are less well identified. Indeed, many vowels,
    especially lax ones, are poorly identified even by the end of the following segment. This may partly reflect listeners’ familiarity with English vowels’ dialectal variability. Diphthongs and diphthongal tense vowels show the most sudden improvement in identification, similar to affricates among the consonants, suggesting that information about segments defined by acoustic change is highly localized. This large dataset provides insights into speech perception and data for probabilistic modeling of spoken word recognition.
  • Weber, A., & Smits, R. (2003). Consonant and vowel confusion patterns by American English listeners. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences.

    Abstract

    This study investigated the perception of American English phonemes by native listeners. Listeners identified either the consonant or the vowel in all possible English CV and VC syllables. The syllables were embedded in multispeaker babble at three signal-to-noise ratios (0 dB, 8 dB, and 16 dB). Effects of syllable position, signal-to-noise ratio, and articulatory features on vowel and consonant identification are discussed. The results constitute the largest source of data that is currently available on phoneme confusion patterns of American English phonemes by native listeners.
  • Weber, A., & Smits, R. (2003). Consonant and vowel confusion patterns by American English listeners. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003) (pp. 1437-1440). Adelaide: Causal Productions.

    Abstract

    This study investigated the perception of American English phonemes by native listeners. Listeners identified either the consonant or the vowel in all possible English CV and VC syllables. The syllables were embedded in multispeaker babble at three signalto-noise ratios (0 dB, 8 dB, and 16 dB). Effects of syllable position, signal-to-noise ratio, and articulatory features on vowel and consonant identification are discussed. The results constitute the largest source of data that is currently available on phoneme confusion patterns of American English phonemes by native listeners.
  • Weber, A. (1998). Listening to nonnative language which violates native assimilation rules. In D. Duez (Ed.), Proceedings of the European Scientific Communication Association workshop: Sound patterns of Spontaneous Speech (pp. 101-104).

    Abstract

    Recent studies using phoneme detection tasks have shown that spoken-language processing is neither facilitated nor interfered with by optional assimilation, but is inhibited by violation of obligatory assimilation. Interpretation of these results depends on an assessment of their generality, specifically, whether they also obtain when listeners are processing nonnative language. Two separate experiments are presented in which native listeners of German and native listeners of Dutch had to detect a target fricative in legal monosyllabic Dutch nonwords. All of the nonwords were correct realisations in standard Dutch. For German listeners, however, half of the nonwords contained phoneme strings which violate the German fricative assimilation rule. Whereas the Dutch listeners showed no significant effects, German listeners detected the target fricative faster when the German fricative assimilation was violated than when no violation occurred. The results might suggest that violation of assimilation rules does not have to make processing more difficult per se.
  • Whorf, B. L. (2012). Language, thought, and reality: selected writings of Benjamin Lee Whorf [2nd ed.]: introduction by John B. Carroll; foreword by Stephen C. Levinson. (J. B. Carroll, S. C. Levinson, & P. Lee, Eds.). Cambridge, MA: MIT Press.

    Abstract

    The pioneering linguist Benjamin Whorf (1897–1941) grasped the relationship between human language and human thinking: how language can shape our innermost thoughts. His basic thesis is that our perception of the world and our ways of thinking about it are deeply influenced by the structure of the languages we speak. The writings collected in this volume include important papers on the Maya, Hopi, and Shawnee languages, as well as more general reflections on language and meaning. Whorf’s ideas about the relation of language and thought have always appealed to a wide audience, but their reception in expert circles has alternated between dismissal and applause. Recently the language sciences have headed in directions that give Whorf’s thinking a renewed relevance. Hence this new edition of Whorf’s classic work is especially timely. The second edition includes all the writings from the first edition as well as John Carroll’s original introduction, a new foreword by Stephen Levinson of the Max Planck Institute for Psycholinguistics that puts Whorf’s work in historical and contemporary context, and new indexes. In addition, this edition offers Whorf’s “Yale Report,” an important work from Whorf’s mature oeuvre.
  • Windhouwer, M., Broeder, D., & Van Uytvanck, D. (2012). A CMD core model for CLARIN web services. In Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 41-48).

    Abstract

    In the CLARIN infrastructure various national projects have started initiatives to allow users of the infrastructure to create chains or workflows of web services. The Component Metadata (CMD) core model for web services described in this paper tries to align the metadata descriptions of these various initiatives. This should allow chaining/workflow engines to find matching and invoke services. The paper describes the landscape of web services architectures and the state of the national initiatives. Based on this a CMD core model for CLARIN is proposed, which, within some limits, can be adapted to the specific needs of an initiative by the standard facilities of CMD. The paper closes with the current state and usage of the model and a look into the future.
  • Windhouwer, M. (2012). RELcat: a Relation Registry for ISOcat data categories. In N. Calzolari (Ed.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 3661-3664). European Language Resources Association (ELRA).

    Abstract

    The ISOcat Data Category Registry contains basically a flat and easily extensible list of data category specifications. To foster reuse and standardization only very shallow relationships among data categories are stored in the registry. However, to assist crosswalks, possibly based on personal views, between various (application) domains and to overcome possible proliferation of data categories more types of ontological relationships need to be specified. RELcat is a first prototype of a Relation Registry, which allows storing arbitrary relationships. These relationships can reflect the personal view of one linguist or a larger community. The basis of the registry is a relation type taxonomy that can easily be extended. This allows on one hand to load existing sets of relations specified in, for example, an OWL (2) ontology or SKOS taxonomy. And on the other hand allows algorithms that query the registry to traverse the stored semantic network to remain ignorant of the original source vocabulary. This paper describes first experiences with RELcat and explains some initial design decisions.
  • Windhouwer, M. (2012). Towards standardized descriptions of linguistic features: ISOcat and procedures for using common data categories. In J. Jancsary (Ed.), Proceedings of the Conference on Natural Language Processing 2012, (SFLR 2012 workshop), September 19-21, 2012, Vienna (pp. 494). Vienna: Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI).

    Abstract

    Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams.
  • Withers, P. (2012). Metadata management with Arbil. In V. Arranz, D. Broeder, B. Gaiffe, M. Gavrilidou, & M. Monachini (Eds.), Proceedings of LREC 2012: 8th International Conference on Language Resources and Evaluation (pp. 72-75). European Language Resources Association (ELRA).

    Abstract

    Arbil is an application designed to create and manage metadata for research data and to arrange this data into a structure appropriate for archiving. The metadata is displayed in tables, which allows an overview of the metadata and the ability to populate and update many metadata sections in bulk. Both IMDI and Clarin metadata formats are supported and Arbil has been designed as a local application so that it can also be used offline, for instance in remote field sites. The metadata can be entered in any order or at any stage that the user is able; once the metadata and its data are ready for archiving and an Internet connection is available it can be exported from Arbil and in the case of IMDI it can then be transferred to the main archive via LAMUS (archive management and upload system).
  • Wittek, A. (1998). Learning verb meaning via adverbial modification: Change-of-state verbs in German and the adverb "wieder" again. In A. Greenhill, M. Hughes, H. Littlefield, & H. Walsh (Eds.), Proceedings of the 22nd Annual Boston University Conference on Language Development (pp. 779-790). Somerville, MA: Cascadilla Press.
  • Witteman, M. J., Bardhan, N. P., Weber, A., & McQueen, J. M. (2011). Adapting to foreign-accented speech: The role of delay in testing. Journal of the Acoustical Society of America. Program abstracts of the 162nd Meeting of the Acoustical Society of America, 130(4), 2443.

    Abstract

    Understanding speech usually seems easy, but it can become noticeably harder when the speaker has a foreign accent. This is because foreign accents add considerable variation to speech. Research on foreign-accented speech shows that participants are able to adapt quickly to this type of variation. Less is known, however, about longer-term maintenance of adaptation. The current study focused on long-term adaptation by exposing native listeners to foreign-accented speech on Day 1, and testing them on comprehension of the accent one day later. Comprehension was thus not tested immediately, but only after a 24 hour period. On Day 1, native Dutch listeners listened to the speech of a Hebrew learner of Dutch while performing a phoneme monitoring task that did not depend on the talker’s accent. In particular, shortening of the long vowel /i/ into /ɪ/ (e.g., lief [li:f], ‘sweet’, pronounced as [lɪf]) was examined. These mispronunciations did not create lexical ambiguities in Dutch. On Day 2, listeners participated in a cross-modal priming task to test their comprehension of the accent. The results will be contrasted with results from an experiment without delayed testing and related to accounts of how listeners maintain adaptation to foreign-accented speech.
  • Witteman, M. J., Weber, A., & McQueen, J. M. (2011). On the relationship between perceived accentedness, acoustic similarity, and processing difficulty in foreign-accented speech. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy (pp. 2229-2232).

    Abstract

    Foreign-accented speech is often perceived as more difficult to understand than native speech. What causes this potential difficulty, however, remains unknown. In the present study, we compared acoustic similarity and accent ratings of American-accented Dutch with a cross-modal priming task designed to measure online speech processing. We focused on two Dutch diphthongs: ui and ij. Though both diphthongs deviated from standard Dutch to varying degrees and perceptually varied in accent strength, native Dutch listeners recognized words containing the diphthongs easily. Thus, not all foreign-accented speech hinders comprehension, and acoustic similarity and perceived accentedness are not always predictive of processing difficulties.
  • Wittenburg, P., Lenkiewicz, P., Auer, E., Gebre, B. G., Lenkiewicz, A., & Drude, S. (2012). AV Processing in eHumanities - a paradigm shift. In J. C. Meister (Ed.), Digital Humanities 2012 Conference Abstracts. University of Hamburg, Germany; July 16–22, 2012 (pp. 538-541).

    Abstract

    Introduction Speech research saw a dramatic change in paradigm in the 90-ies. While earlier the discussion was dominated by a phoneticians’ approach who knew about phenomena in the speech signal, the situation completely changed after stochastic machinery such as Hidden Markov Models [1] and Artificial Neural Networks [2] had been introduced. Speech processing was now dominated by a purely mathematic approach that basically ignored all existing knowledge about the speech production process and the perception mechanisms. The key was now to construct a large enough training set that would allow identifying the many free parameters of such stochastic engines. In case that the training set is representative and the annotations of the training sets are widely ‘correct’ we could assume to get a satisfyingly functioning recognizer. While the success of knowledge-based systems such as Hearsay II [3] was limited, the statistically based approach led to great improvements in recognition rates and to industrial applications.
  • Wnuk, E., & Majid, A. (2012). Olfaction in a hunter-gatherer society: Insights from language and culture. In N. Miyake, D. Peebles, & R. P. Cooper (Eds.), Proceedings of the 34th Annual Meeting of the Cognitive Science Society (CogSci 2012) (pp. 1155-1160). Austin, TX: Cognitive Science Society.

    Abstract

    According to a widely-held view among various scholars, olfaction is inferior to other human senses. It is also believed by many that languages do not have words for describing smells. Data collected among the Maniq, a small population of nomadic foragers in southern Thailand, challenge the above claims and point to a great linguistic and cultural elaboration of odor. This article presents evidence of the importance of olfaction in indigenous rituals and beliefs, as well as in the lexicon. The results demonstrate the richness and complexity of the domain of smell in Maniq society and thereby challenge the universal paucity of olfactory terms and insignificance of olfaction for humans.
  • Young, D., Altmann, G. T., Cutler, A., & Norris, D. (1993). Metrical structure and the perception of time-compressed speech. In Eurospeech 93: Vol. 2 (pp. 771-774).

    Abstract

    In the absence of explicitly marked cues to word boundaries, listeners tend to segment spoken English at the onset of strong syllables. This may suggest that under difficult listening conditions, speech should be easier to recognize where strong syllables are word-initial. We report two experiments in which listeners were presented with sentences which had been time-compressed to make listening difficult. The first study contrasted sentences in which all content words began with strong syllables with sentences in which all content words began with weak syllables. The intelligibility of the two groups of sentences did not differ significantly. Apparent rhythmic effects in the results prompted a second experiment; however, no significant effects of systematic rhythmic manipulation were observed. In both experiments, the strongest predictor of intelligibility was the rated plausibility of the sentences. We conclude that listeners' recognition responses to time-compressed speech may be strongly subject to experiential bias; effects of rhythmic structure are most likely to show up also as bias effects.
  • Zampieri, M., & Gebre, B. G. (2012). Automatic identification of language varieties: The case of Portuguese. In J. Jancsary (Ed.), Proceedings of the Conference on Natural Language Processing 2012, September 19-21, 2012, Vienna (pp. 233-237). Vienna: Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI).

    Abstract

    Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams.
  • Zampieri, M., Gebre, B. G., & Diwersy, S. (2012). Classifying pluricentric languages: Extending the monolingual model. In Proceedings of SLTC 2012. The Fourth Swedish Language Technology Conference. Lund, October 24-26, 2012 (pp. 79-80). Lund University.

    Abstract

    This study presents a new language identification model for pluricentric languages that uses n-gram language models at the character and word level. The model is evaluated in two steps. The first step consists of the identification of two varieties of Spanish (Argentina and Spain) and two varieties of French (Quebec and France) evaluated independently in binary classification schemes. The second step integrates these language models in a six-class classification with two Portuguese varieties.
  • Zeshan, U., & De Vos, C. (Eds.). (2012). Sign languages in village communities: Anthropological and linguistic insights. Berlin: Mouton de Gruyter.

    Abstract

    The book is a unique collection of research on sign languages that have emerged in rural communities with a high incidence of, often hereditary, deafness. These sign languages represent the latest addition to the comparative investigation of languages in the gestural modality, and the book is the first compilation of a substantial number of different "village sign languages".Written by leading experts in the field, the volume uniquely combines anthropological and linguistic insights, looking at both the social dynamics and the linguistic structures in these village communities. The book includes primary data from eleven different signing communities across the world, including results from Jamaica, India, Turkey, Thailand, and Bali. All known village sign languages are endangered, usually because of pressure from larger urban sign languages, and some have died out already. Ironically, it is often the success of the larger sign language communities in urban centres, their recognition and subsequent spread, which leads to the endangerment of these small minority sign languages. The book addresses this specific type of language endangerment, documentation strategies, and other ethical issues pertaining to these sign languages on the basis of first-hand experiences by Deaf fieldworkers

Share this page