Publications

Displaying 1 - 25 of 25

Levshina, N., Koptjevskaja-Tamm, M., & Östling, R. (2024). Revered and reviled: A sentiment analysis of female and male referents in three languages. Frontiers in Communication, 9: 1266407. doi:10.3389/fcomm.2024.1266407.

DOI

Full Text

Abstract
Our study contributes to the less explored domain of lexical typology, focusing on semantic prosody and connotation. Semantic derogation, or pejoration of nouns referring to women, whereby such words acquire connotations and further denotations of social pejoration, immorality and/or loose sexuality, has been a very prominent question in studies on gender and language (change). It has been argued that pejoration emerges due to the general derogatory attitudes toward female referents. However, the evidence for systematic differences in connotations of female- vs. male-related words is fragmentary and often fairly impressionistic; moreover, many researchers argue that expressed sentiments toward women (as well as men) often are ambivalent. One should also expect gender differences in connotations to have decreased in the recent years, thanks to the advances of feminism and social progress. We test these ideas in a study of positive and negative connotations of feminine and masculine term pairs such as woman - man, girl - boy, wife - husband, etc. Sentences containing these words were sampled from diachronic corpora of English, Chinese and Russian, and sentiment scores for every word were obtained using two systems for Aspect-Based Sentiment Analysis: PyABSA, and OpenAI’s large language model GPT-3.5. The Generalized Linear Mixed Models of our data provide no indications of significantly more negative sentiment toward female referents in comparison with their male counterparts. However, some of the models suggest that female referents are more infrequently associated with neutral sentiment than male ones. Neither do our data support the hypothesis of the diachronic convergence between the genders. In sum, results suggest that pejoration is unlikely to be explained simply by negative attitudes to female referents in general.

Additional information
supplementary materials

Permanent link to publication record
De Hoop, H., Levshina, N., & Segers, M. (2023). The effect of the use of T and V pronouns in Dutch HR communication. Journal of Pragmatics, 203, 96-109. doi:10.1016/j.pragma.2022.11.017.

DOI

Full Text

Abstract
In an online experiment among native speakers of Dutch we measured addressees' responses to emails written in the informal pronoun T or the formal pronoun V in HR communication. 172 participants (61 male, mean age 37 years) read either the V-versions or the T-versions of two invitation emails and two rejection emails by four different fictitious recruiters. After each email, participants had to score their appreciation of the company and the recruiter on five different scales each, such as The recruiter who wrote this email seems … [scale from friendly to unfriendly]. We hypothesized that (i) the V-pronoun would be more appreciated in letters of rejection, and the T-pronoun in letters of invitation, and (ii) older people would appreciate the V-pronoun more than the T-pronoun, and the other way around for younger people. Although neither of these hypotheses was supported, we did find a small effect of pronoun: Emails written in V were more highly appreciated than emails in T, irrespective of type of email (invitation or rejection), and irrespective of the participant's age, gender, and level of education. At the same time, we observed differences in the strength of this effect across different scales.

Permanent link to publication record
Levshina, N. (2023). Communicative efficiency: Language structure and use. Cambridge: Cambridge University Press.

Abstract
All living beings try to save effort, and humans are no exception. This groundbreaking book shows how we save time and energy during communication by unconsciously making efficient choices in grammar, lexicon and phonology. It presents a new theory of 'communicative efficiency', the idea that language is designed to be as efficient as possible, as a system of communication. The new framework accounts for the diverse manifestations of communicative efficiency across a typologically broad range of languages, using various corpus-based and statistical approaches to explain speakers' bias towards efficiency. The author's unique interdisciplinary expertise allows her to provide rich evidence from a broad range of language sciences. She integrates diverse insights from over a hundred years of research into this comprehensible new theory, which she presents step-by-step in clear and accessible language. It is essential reading for language scientists, cognitive scientists and anyone interested in language use and communication.

Permanent link to publication record
Levshina, N., Namboodiripad, S., Allassonnière-Tang, M., Kramer, M., Talamo, L., Verkerk, A., Wilmoth, S., Garrido Rodriguez, G., Gupton, T. M., Kidd, E., Liu, Z., Naccarato, C., Nordlinger, R., Panova, A., & Stoynova, N. (2023). Why we need a gradient approach to word order. Linguistics, 61(4), 825-883. doi:10.1515/ling-2021-0098.

DOI

Full Text

Abstract
This article argues for a gradient approach to word order, which treats word order preferences, both within and across languages, as a continuous variable. Word order variability should be regarded as a basic assumption, rather than as something exceptional. Although this approach follows naturally from the emergentist usage-based view of language, we argue that it can be beneficial for all frameworks and linguistic domains, including language acquisition, processing, typology, language contact, language evolution and change, and formal approaches. Gradient approaches have been very fruitful in some domains, such as language processing, but their potential is not fully realized yet. This may be due to practical reasons. We discuss the most pressing methodological challenges in corpus-based and experimental research of word order and propose some practical solutions.

Additional information
The datasets and code used for the quantitative case studies can be found in th…

Permanent link to publication record
Levshina, N. (2023). Testing communicative and learning biases in a causal model of language evolution:A study of cues to Subject and Object. In M. Degano, T. Roberts, G. Sbardolini, & M. Schouwstra (Eds.), The Proceedings of the 23rd Amsterdam Colloquium (pp. 383-387). Amsterdam: University of Amsterdam.

Full Text

Permanent link to publication record
Levshina, N. (2023). Word classes in corpus linguistics. In E. Van Lier (Ed.), The Oxford handbook of word classes (pp. 833-850). Oxford: Oxford University Press. doi:10.1093/oxfordhb/9780198852889.013.34.

DOI

Full Text

Abstract
Word classes play a central role in corpus linguistics under the name of parts of speech (POS). Many popular corpora are provided with POS tags. This chapter gives examples of popular tagsets and discusses the methods of automatic tagging. It also considers bottom-up approaches to POS induction, which are particularly important for the ‘poverty of stimulus’ debate in language acquisition research. The choice of optimal POS tagging involves many difficult decisions, which are related to the level of granularity, redundancy at different levels of corpus annotation, cross-linguistic applicability, language-specific descriptive adequacy, and dealing with fuzzy boundaries between POS. The chapter also discusses the problem of flexible word classes and demonstrates how corpus data with POS tags and syntactic dependencies can be used to quantify the level of flexibility in a language.

Permanent link to publication record
Levshina, N. (2022). Frequency, informativity and word length: Insights from typologically diverse corpora. Entropy, 24(2): 280. doi:10.3390/e24020280.

DOI

Full Text

Abstract
Zipf’s law of abbreviation, which posits a negative correlation between word frequency and length, is one of the most famous and robust cross-linguistic generalizations. At the same time, it has been shown that contextual informativity (average surprisal given previous context) is more strongly correlated with word length, although this tendency is not observed consistently, depending on several methodological choices. The present study examines a more diverse sample of languages than the previous studies (Arabic, Finnish, Hungarian, Indonesian, Russian, Spanish and Turkish). I use large web-based corpora from the Leipzig Corpora Collection to estimate word lengths in UTF-8 characters and in phonemes (for some of the languages), as well as word frequency, informativity given previous word and informativity given next word, applying different methods of bigrams processing. The results show different correlations between word length and the corpus-based measure for different languages. I argue that these differences can be explained by the properties of noun phrases in a language, most importantly, by the order of heads and modifiers and their relative morphological complexity, as well as by orthographic conventions

Additional information
datasets

Permanent link to publication record
Levshina, N., & Hawkins, J. A. (2022). Verb-argument lability and its correlations with other typological parameters. A quantitative corpus-based study. Linguistic Typology at the Crossroads, 2(1), 94-120. doi:10.6092/issn.2785-0943/13861.

DOI

Full Text

Abstract
We investigate the correlations between A- and P-lability for verbal arguments with other typological parameters using large, syntactically annotated corpora of online news in 28 languages. To estimate how much lability is observed in a language, we measure associations between Verbs or Verb + Noun combinations and the alternating constructions in which they occur. Our correlational analyses show that high P-lability scores correlate strongly with the following parameters: little or no case marking; weaker associations between lexemes and the grammatical roles A and P; rigid order of Subject and Object; and a high proportion of verb-medial clauses (SVO). Low P-lability correlates with the presence of case marking, stronger associations between nouns and grammatical roles, relatively flexible ordering of Subject and Object, and verb-final order. As for A-lability, it is not correlated with any other parameters. A possible reason is that A-lability is a result of more universal discourse processes, such as deprofiling of the object, and also exhibits numerous lexical and semantic idiosyncrasies. The fact that P-lability is strongly correlated with other parameters can be interpreted as evidence for a more general typology of languages, in which some tend to have highly informative morphosyntactic and lexical cues, whereas others rely predominantly on contextual environment, which is possibly due to fixed word order. We also find that P-lability is more strongly correlated with the other parameters than any of these parameters are with each other, which means that it can be a very useful typological variable.

Permanent link to publication record
Levshina, N., & Lorenz, D. (2022). Communicative efficiency and the Principle of No Synonymy: Predictability effects and the variation of want to and wanna. Language and Cognition, 14(2), 249-274. doi:10.1017/langcog.2022.7.

DOI

Full Text

Abstract
There is ample psycholinguistic evidence that speakers behave efficiently, using shorter and less effortful constructions when the meaning is more predictable, and longer and more effortful ones when it is less predictable. However, the Principle of No Synonymy requires that all formally distinct variants should also be functionally different. The question is how much two related constructions should overlap semantically and pragmatically in order to be used for the purposes of efficient communication. The case study focuses on want to + Infinitive and its reduced variant with wanna, which have different stylistic and sociolinguistic connotations. Bayesian mixed-effects regression modelling based on the spoken part of the British National Corpus reveals a very limited effect of efficiency: predictability increases the chances of the reduced variant only in fast speech. We conclude that efficient use of more and less effortful variants is restricted when two variants are associated with different registers or styles. This paper also pursues a methodological goal regarding missing values in speech corpora. We impute missing data based on the existing values. A comparison of regression models with and without imputed values reveals similar tendencies. This means that imputation is useful for dealing with missing values in corpora.

Additional information
supplementary materials

Permanent link to publication record
Levshina, N. (2022). Semantic maps of causation: New hybrid approaches based on corpora and grammar descriptions. Zeitschrift für Sprachwissenschaft, 41(1), 179-205. doi:10.1515/zfs-2021-2043.

DOI

Full Text

Abstract
The present paper discusses connectivity and proximity maps of causative constructions and combines them with different types of typological data. In the first case study, I show how one can create a connectivity map based on a parallel corpus. This allows us to solve many problems, such as incomplete descriptions, inconsistent terminology and the problem of determining the semantic nodes. The second part focuses on proximity maps based on Multidimensional Scaling and compares the most important semantic distinctions, which are inferred from a parallel corpus of film subtitles and from grammar descriptions. The results suggest that corpus-based maps of tokens are more sensitive to cultural and genre-related differences in the prominence of specific causation scenarios than maps based on constructional types, which are described in reference grammars. The grammar-based maps also reveal a less clear structure, which can be due to incomplete semantic descriptions in grammars. Therefore, each approach has its shortcomings, which researchers need to be aware of.

Additional information
datasets and R code for the statistical analysis of the case studies

Permanent link to publication record
Levshina, N. (2022). Corpus-based typology: Applications, challenges and some solutions. Linguistic Typology, 26(1), 129-160. doi:10.1515/lingty-2020-0118.

DOI

Full Text

Abstract
Over the last few years, the number of corpora that can be used for language comparison has dramatically increased. The corpora are so diverse in their structure, size and annotation style, that a novice might not know where to start. The present paper charts this new and changing territory, providing a few landmarks, warning signs and safe paths. Although no corpora corpus at present can replace the traditional type of typological data based on language description in reference grammars, they corpora can help with diverse tasks, being particularly well suited for investigating probabilistic and gradient properties of languages and for discovering and interpreting cross-linguistic generalizations based on processing and communicative mechanisms. At the same time, the use of corpora for typological purposes has not only advantages and opportunities, but also numerous challenges. This paper also contains an empirical case study addressing two pertinent problems: the role of text types in language comparison and the problem of the word as a comparative concept.

Permanent link to publication record
Levshina, N. (2022). Comparing Bayesian and frequentist models of language variation: The case of help + (to) Infinitive. In O. Schützler, & J. Schlüter (Eds.), Data and methods in corpus linguistics – Comparative Approaches (pp. 224-258). Cambridge: Cambridge University Press.

Permanent link to publication record
Levshina, N. (2021). Cross-linguistic trade-offs and causal relationships between cues to grammatical subject and object, and the problem of efficiency-related explanations. Frontiers in Psychology, 12: 648200. doi:10.3389/fpsyg.2021.648200.

DOI

Full Text

Abstract
Cross-linguistic studies focus on inverse correlations (trade-offs) between linguistic variables that reflect different cues to linguistic meanings. For example, if a language has no case marking, it is likely to rely on word order as a cue for identification of grammatical roles. Such inverse correlations are interpreted as manifestations of language users’ tendency to use language efficiently. The present study argues that this interpretation is problematic. Linguistic variables, such as the presence of case, or flexibility of word order, are aggregate properties, which do not represent the use of linguistic cues in context directly. Still, such variables can be useful for circumscribing the potential role of communicative efficiency in language evolution, if we move from cross-linguistic trade-offs to multivariate causal networks. This idea is illustrated by a case study of linguistic variables related to four types of Subject and Object cues: case marking, rigid word order of Subject and Object, tight semantics and verb-medial order. The variables are obtained from online language corpora in thirty languages, annotated with the Universal Dependencies. The causal model suggests that the relationships between the variables can be explained predominantly by sociolinguistic factors, leaving little space for a potential impact of efficient linguistic behavior.

Permanent link to publication record
Levshina, N. (2021). Conditional inference trees and random forests. In M. Paquot, & T. Gries (Eds.), Practical Handbook of Corpus Linguistics (pp. 611-643). New York: Springer.

Full Text

Permanent link to publication record
Levshina, N., & Moran, S. (2021). Efficiency in human languages: Corpus evidence for universal principles. Linguistics Vanguard, 7(s3): 20200081. doi:10.1515/lingvan-2020-0081.

DOI

Full Text

Abstract
Over the last few years, there has been a growing interest in communicative efficiency. It has been argued that language users act efficiently, saving effort for processing and articulation, and that language structure and use reflect this tendency. The emergence of new corpus data has brought to life numerous studies on efficient language use in the lexicon, in morphosyntax, and in discourse and phonology in different languages. In this introductory paper, we discuss communicative efficiency in human languages, focusing on evidence of efficient language use found in multilingual corpora. The evidence suggests that efficiency is a universal feature of human language. We provide an overview of different manifestations of efficiency on different levels of language structure, and we discuss the major questions and findings so far, some of which are addressed for the first time in the contributions in this special collection.

Permanent link to publication record
Levshina, N., & Moran, S. (Eds.). (2021). Efficiency in human languages: Corpus evidence for universal principles [Special Issue]. Linguistics Vanguard, 7(s3).

Permanent link to publication record
Levshina, N. (2021). Communicative efficiency and differential case marking: A reverse-engineering approach. Linguistics Vanguard, 7(s3): 20190087. doi:10.1515/lingvan-2019-0087.

DOI

Full Text

Permanent link to publication record
Levshina, N. (2020). How tight is your language? A semantic typology based on Mutual Information. In K. Evang, L. Kallmeyer, R. Ehren, S. Petitjean, E. Seyffarth, & D. Seddah (Eds.), Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories (pp. 70-78). Düsseldorf, Germany: Association for Computational Linguistics. doi:10.18653/v1/2020.tlt-1.7.

DOI

Full Text

Abstract
Languages differ in the degree of semantic flexibility of their syntactic roles. For example, Eng-
lish and Indonesian are considered more flexible with regard to the semantics of subjects,
whereas German and Japanese are less flexible. In Hawkins’ classification, more flexible lan-
guages are said to have a loose fit, and less flexible ones are those that have a tight fit. This
classification has been based on manual inspection of example sentences. The present paper
proposes a new, quantitative approach to deriving the measures of looseness and tightness from
corpora. We use corpora of online news from the Leipzig Corpora Collection in thirty typolog-
ically and genealogically diverse languages and parse them syntactically with the help of the
Universal Dependencies annotation software. Next, we compute Mutual Information scores for
each language using the matrices of lexical lemmas and four syntactic dependencies (intransi-
tive subjects, transitive subject, objects and obliques). The new approach allows us not only to
reproduce the results of previous investigations, but also to extend the typology to new lan-
guages. We also demonstrate that verb-final languages tend to have a tighter relationship be-
tween lexemes and syntactic roles, which helps language users to recognize thematic roles early
during comprehension.

Additional information
full text via ACL website

Permanent link to publication record
Levshina, N. (2020). Efficient trade-offs as explanations in functional linguistics: some problems and an alternative proposal. Revista da Abralin, 19(3), 50-78. doi:10.25189/rabralin.v19i3.1728.

DOI

Full Text

Abstract
The notion of efficient trade-offs is frequently used in functional linguis-tics in order to explain language use and structure. In this paper I argue that this notion is more confusing than enlightening. Not every negative correlation between parameters represents a real trade-off. Moreover, trade-offs are usually reported between pairs of variables, without taking into account the role of other factors. These and other theoretical issues are illustrated in a case study of linguistic cues used in expressing “who did what to whom”: case marking, rigid word order and medial verb posi-tion. The data are taken from the Universal Dependencies corpora in 30 languages and annotated corpora of online news from the Leipzig Corpora collection. We find that not all cues are correlated negatively, which ques-tions the assumption of language as a zero-sum game. Moreover, the cor-relations between pairs of variables change when we incorporate the third variable. Finally, the relationships between the variables are not always bi-directional. The study also presents a causal model, which can serve as a more appropriate alternative to trade-offs.

Permanent link to publication record
Levshina, N. (2019). Token-based typology and word order entropy: A study based on universal dependencies. Linguistic Typology, 23(3), 533-572. doi:10.1515/lingty-2019-0025.

DOI

Full Text

Abstract
The present paper discusses the benefits and challenges of token-based typology, which takes into account the frequencies of words and constructions in language use. This approach makes it possible to introduce new criteria for language classification, which would be difficult or impossible to achieve with the traditional, type-based approach. This point is illustrated by several quantitative studies of word order variation, which can be measured as entropy at different levels of granularity. I argue that this variation can be explained by general functional mechanisms and pressures, which manifest themselves in language use, such as optimization of processing (including avoidance of ambiguity) and grammaticalization of predictable units occurring in chunks. The case studies are based on multilingual corpora, which have been parsed using the Universal Dependencies annotation scheme.

Additional information
lingty-2019-0025ad.zip

Permanent link to publication record
Levshina, N. (2018). Probabilistic grammar and constructional predictability: Bayesian generalized additive models of help. GLOSSA-a journal of general linguistics, 3(1): 55. doi:10.5334/gjgl.294.

DOI

Full Text

Abstract
The present study investigates the construction with help followed by the bare or to-infinitive in seven varieties of web-based English from Australia, Ghana, Great Britain, Hong Kong, India, Jamaica and the USA. In addition to various factors known from the literature, such as register, minimization of cognitive complexity and avoidance of identity (horror aequi), it studies the effect of predictability of the infinitive given help and the other way round on the language user’s choice between the constructional variants. These probabilistic constraints are tested in a series of Bayesian generalized additive mixed-effects regression models. The results demonstrate that the to-infinitive is particularly frequent in contexts with low predictability, or, in information-theoretic terms, with high information content. This tendency is interpreted as communicatively efficient behaviour, when more predictable units of discourse get less formal marking, and less predictable ones get more formal marking. However, the strength, shape and directionality of predictability effects exhibit variation across the countries, which demonstrates the importance of the cross-lectal perspective in research on communicative efficiency and other universal functional principles.

Permanent link to publication record
Levshina, N. (2016). When variables align: A Bayesian multinomial mixed-effects model of English permissive constructions. Cognitive Linguistics, 27(2), 235-268. doi:10.1515/cog-2015-0054.

DOI

Permanent link to publication record
Gast, V., & Levshina, N. (2014). Motivating w(h)-Clefts in English and German: A hypothesis-driven parallel corpus study. In A.-M. De Cesare (Ed.), Frequency, Forms and Functions of Cleft Constructions in Romance and Germanic: Contrastive, Corpus-Based Studies (pp. 377-414). Berlin: De Gruyter.

DOI

Permanent link to publication record
Levshina, N., Geeraerts, D., & Speelman, D. (2013). Mapping constructional spaces: A contrastive analysis of English and Dutch analytic causatives. Linguistics, 51(4), 825-854. doi:10.1515/ling-2013-0028.

DOI

Full Text

Abstract
The paper demonstrates how verb and noun classes can be used as a common interface in contrastive Construction Grammar. It presents an innovative approach to the contrastive analysis of constructional spaces (sets of constructions covering a certain semantic domain). We compare English and Dutch analytic causatives by using the statistical technique of multiple correspondence analysis applied to data from large monolingual corpora. The method allows us to explore the common conceptual space of the constructions, in particular the salient semantic dimensions and causation types, which emerge on the basis of co-occurring semantic classes of the nominal and verbal slot fillers in constructional exemplars. The formal patterns of the constructions at different levels of specificity are projected onto this space. Our analyses show that an average Dutch analytic causative refers to more indirect and abstract causation with fewer animate than its English counterpart. We have also found that the languages “cut” the common conceptual space in unique ways, although the semantic areas of many English and Dutch constructions overlap substantially. Nevertheless, the form-meaning mapping in the two languages displays commonalities. Both English and Dutch constructions with prepositionally marked or implicit causees are strongly associated with animate causees. We have also observed a correlation between the directness of causation and the crosslinguistic hierarchy of affectedness marking proposed by Kemmer and Verhagen (1994).

Permanent link to publication record
Levshina, N., Geeraerts, D., & Speelman, D. (2013). Towards a 3D-grammar: Interaction of linguistic and extralinguistic factors in the use of Dutch causative constructions. Journal of Pragmatics, 52, 34-48. doi:10.1016/j.pragma.2012.12.013.

DOI

Abstract
The integration of three main dimensions of linguistic usage and variation – formal, social and conceptual – can be seen as a major ambition of the Cognitive Sociolinguistics enterprise. The paper illustrates this theoretical approach with a corpus-based study of near-synonymous causative constructions with doen and laten in the Belgian and Netherlandic varieties of Dutch. A series of quantitative analyses show a complex interplay of the dimensions at different levels of constructional schematicity. At the more schematic level, the results indicate that the effects of transitivity and coreferentiality on the probability of the two constructions are slightly different in the two varieties. However, incorporating the effected predicate slot fillers in a mixed-effect model reveals that these differences can be explained to a large extent by the country-specific lexical patterns. These findings suggest that the interplay of the lectal and conceptual factors in constructional variation should be studied at varying degrees of constructional schematicity.

Permanent link to publication record

Publications

Abstract

Additional information

Abstract

Abstract

Abstract

Additional information

Abstract

Abstract

Additional information

Abstract

Abstract

Additional information

Abstract

Additional information

Abstract

Abstract

Abstract

Abstract

Additional information

Abstract

Abstract

Additional information

Abstract

Abstract

Abstract

Contact

Follow us

Breadcrumb

Publications

Abstract

Additional information

Abstract

Abstract

Abstract

Additional information

Abstract

Abstract

Additional information

Abstract

Abstract

Additional information

Abstract

Additional information

Abstract

Abstract

Abstract

Abstract

Additional information

Abstract

Abstract

Additional information

Abstract

Abstract

Abstract

Share this page