Defining and counting phonological classes in cross-linguistic segment databases
Dediu, D., & Moisik, S.
Defining and counting phonological classes in cross-linguistic segment databases. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.
), Proceedings of LREC 2016: 10th International Conference on Language Resources and Evaluation
(pp. 1955-1962). Paris: European Language Resources Association (ELRA).
Recently, there has been an explosion in the availability of large, good-quality cross-linguistic databases such as WALS
(Dryer & Haspelmath, 2013), Glottolog (Hammarstrom et al., 2015) and Phoible (Moran & McCloy, 2014). Databases
such as Phoible contain the actual segments used by various languages as they are given in the primary language
descriptions. However, this segment-level representation cannot be used directly for analyses that require generalizations
over classes of segments that share theoretically interesting features. Here we present a method and the associated
R (R Core Team, 2014) code that allows the
exible denition of such meaningful classes and that can identify the
sets of segments falling into such a class for any language inventory. The method and its results are important for
those interested in exploring cross-linguistic patterns of phonetic and phonological diversity and their relationship to
extra-linguistic factors and processes such as climate, economics, history or human genetics.