Publications

Displaying 1 - 6 of 6
  • Bergelson, E., Soderstrom, M., Schwarz, I.-C., Rowland, C. F., Ramírez-Esparza, N., Rague Hamrick, L., Marklund, E., Kalashnikova, M., Guez, A., Casillas, M., Benetti, L., Van Alphen, P. M., & Cristia, A. (2023). Everyday language input and production in 1,001 children from six continents. Proceedings of the National Academy of Sciences of the United States of America, 120(52): 2300671120. doi:10.1073/pnas.2300671120.

    Abstract

    Language is a universal human ability, acquired readily by young children, whootherwise struggle with many basics of survival. And yet, language ability is variableacross individuals. Naturalistic and experimental observations suggest that children’slinguistic skills vary with factors like socioeconomic status and children’s gender.But which factors really influence children’s day-to-day language use? Here, weleverage speech technology in a big-data approach to report on a unique cross-culturaland diverse data set: >2,500 d-long, child-centered audio-recordings of 1,001 2- to48-mo-olds from 12 countries spanning six continents across urban, farmer-forager,and subsistence-farming contexts. As expected, age and language-relevant clinical risksand diagnoses predicted how much speech (and speech-like vocalization) childrenproduced. Critically, so too did adult talk in children’s environments: Children whoheard more talk from adults produced more speech. In contrast to previous conclusionsbased on more limited sampling methods and a different set of language proxies,socioeconomic status (operationalized as maternal education) was not significantlyassociated with children’s productions over the first 4 y of life, and neither weregender or multilingualism. These findings from large-scale naturalistic data advanceour understanding of which factors are robust predictors of variability in the speechbehaviors of young learners in a wide range of everyday contexts
  • Casillas, M., Brown, P., & Levinson, S. C. (2021). Early language experience in a Papuan community. Journal of Child Language, 48(4), 792-814. doi:10.1017/S0305000920000549.

    Abstract

    The rate at which young children are directly spoken to varies due to many factors, including (a) caregiver ideas about children as conversational partners and (b) the organization of everyday life. Prior work suggests cross-cultural variation in rates of child-directed speech is due to the former factor, but has been fraught with confounds in comparing postindustrial and subsistence farming communities. We investigate the daylong language environments of children (0;0–3;0) on Rossel Island, Papua New Guinea, a small-scale traditional community where prior ethnographic study demonstrated contingency-seeking child interaction styles. In fact, children were infrequently directly addressed and linguistic input rate was primarily affected by situational factors, though children’s vocalization maturity showed no developmental delay. We compare the input characteristics between this community and a Tseltal Mayan one in which near-parallel methods produced comparable results, then briefly discuss the models and mechanisms for learning best supported by our findings.
  • Cychosz, M., Cristia, A., Bergelson, E., Casillas, M., Baudet, G., Warlaumont, A. S., Scaff, C., Yankowitz, L., & Seidl, A. (2021). Vocal development in a large‐scale crosslinguistic corpus. Developmental Science, 24(5): e13090. doi:10.1111/desc.13090.

    Abstract

    This study evaluates whether early vocalizations develop in similar ways in children across diverse cultural contexts. We analyze data from daylong audio recordings of 49 children (1–36 months) from five different language/cultural backgrounds. Citizen scientists annotated these recordings to determine if child vocalizations contained canonical transitions or not (e.g., “ba” vs. “ee”). Results revealed that the proportion of clips reported to contain canonical transitions increased with age. Furthermore, this proportion exceeded 0.15 by around 7 months, replicating and extending previous findings on canonical vocalization development but using data from the natural environments of a culturally and linguistically diverse sample. This work explores how crowdsourcing can be used to annotate corpora, helping establish developmental milestones relevant to multiple languages and cultures. Lower inter‐annotator reliability on the crowdsourcing platform, relative to more traditional in‐lab expert annotators, means that a larger number of unique annotators and/or annotations are required, and that crowdsourcing may not be a suitable method for more fine‐grained annotation decisions. Audio clips used for this project are compiled into a large‐scale infant vocalization corpus that is available for other researchers to use in future work.

    Additional information

    supporting information audio data
  • Frost, R. L. A., & Casillas, M. (2021). Investigating statistical learning of nonadjacent dependencies: Running statistical learning tasks in non-WEIRD populations. In SAGE Research Methods Cases. doi:10.4135/9781529759181.

    Abstract

    Language acquisition is complex. However, one thing that has been suggested to help learning is the way that information is distributed throughout language; co-occurrences among particular items (e.g., syllables and words) have been shown to help learners discover the words that a language contains and figure out how those words are used. Humans’ ability to draw on this information—“statistical learning”—has been demonstrated across a broad range of studies. However, evidence from non-WEIRD (Western, Educated, Industrialized, Rich, and Democratic) societies is critically lacking, which limits theorizing on the universality of this skill. We extended work on statistical language learning to a new, non-WEIRD linguistic population: speakers of Yélî Dnye, who live on a remote island off mainland Papua New Guinea (Rossel Island). We performed a replication of an existing statistical learning study, training adults on an artificial language with statistically defined words, then examining what they had learnt using a two-alternative forced-choice test. Crucially, we implemented several key amendments to the original study to ensure the replication was suitable for remote field-site testing with speakers of Yélî Dnye. We made critical changes to the stimuli and materials (to test speakers of Yélî Dnye, rather than English), the instructions (we re-worked these significantly, and added practice tasks to optimize participants’ understanding), and the study format (shifting from a lab-based to a portable tablet-based setup). We discuss the requirement for acute sensitivity to linguistic, cultural, and environmental factors when adapting studies to test new populations.

  • Räsänen, O., Seshadri, S., Lavechin, M., Cristia, A., & Casillas, M. (2021). ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. Behavior Research Methods, 53, 818-835. doi:10.3758/s13428-020-01460-x.

    Abstract

    Recordings captured by wearable microphones are a standard method for investigating young children’s language environments. A key measure to quantify from such data is the amount of speech present in children’s home environments. To this end, the LENA recorder and software—a popular system for measuring linguistic input—estimates the number of adult words that children may hear over the course of a recording. However, word count estimation is challenging to do in a language-independent manner; the relationship between observable acoustic patterns and language-specific lexical entities is far from uniform across human languages. In this paper, we ask whether some alternative linguistic units, namely phone(me)s or syllables, could be measured instead of, or in parallel with, words in order to achieve improved cross-linguistic applicability and comparability of an automated system for measuring child language input. We discuss the advantages and disadvantages of measuring different units from theoretical and technical points of view. We also investigate the practical applicability of measuring such units using a novel system called Automatic LInguistic unit Count Estimator (ALICE) together with audio from seven child-centered daylong audio corpora from diverse cultural and linguistic environments. We show that language-independent measurement of phoneme counts is somewhat more accurate than syllables or words, but all three are highly correlated with human annotations on the same data. We share an open-source implementation of ALICE for use by the language research community, allowing automatic phoneme, syllable, and word count estimation from child-centered audio recordings.
  • Casillas, M., & Amaral, P. (2011). Learning cues to category membership: Patterns in children’s acquisition of hedges. In C. Cathcart, I.-H. Chen, G. Finley, S. Kang, C. S. Sandy, & E. Stickles (Eds.), Proceedings of the Berkeley Linguistics Society 37th Annual Meeting (pp. 33-45). Linguistic Society of America, eLanguage.

    Abstract

    When we think of children acquiring language, we often think of their acquisition of linguistic structure as separate from their acquisition of knowledge about the world. But it is clear that in the process of learning about language, children consult what they know about the world; and that in learning about the world, children use linguistic cues to discover how items are related to one another. This interaction between the acquisition of linguistic structure and the acquisition of category structure is especially clear in word learning.

Share this page