From the first day of life the human infant produces vocalizations that are not tied to distress, but rather appear to manifest a deep endogenous tendency to explore vocalization for its own sake. We call these vocalizations protophones (vowel-like sounds, squeals, growls, raspberries and so on) and are increasingly certain that their production provides a necessary foundation for subsequent steps in vocal language development1. Each of them can be produced flexibly, as all words and sentences can be, with a wide range of affective content on different occasions, including neutral affect2,3. The predominance of neutral affect in protophone production suggests interest in sound itself more than a need to communicate. Through research with all-day recordings, randomly sampled and subjected to human coding, we now know that protophones occur at massive rates throughout the first year of life, at 4-5 protophones per minute. Even infants born prematurely, at 32 weeks gestational age, still in neonatal intensive care, just able to breathe on their own, produce hundreds of protophones per day4. This rate is at least 5 times higher than the rate of crying at every month across the first year of life.
Perhaps even more remarkable, more than 70% of protophones across the first year appear to be produced with no social intent, which is to say they are directed to no one5. Even when infants are comfortable, awake, and alone in a room, they produce an average of more than 4 protophones per minute.
Bonobo infants produce screams and laughs that can be analogized to crying and laughter in human infants, but bonobos produce scarce protophone-like vocalizations, far less than one-tenth as many as human infants at similar ages6. In addition the protophone-like vocalizations of bonobo infants are not observed to occur in vocal play or exploration at all, but instead predominantly express distress. Similar observations appear to apply to other great apes7.
What could the selection pressures have been that led ancient hominins toward such flexible and copious vocalization before there was full-fledged language? We argue that the human (and ancient hominin) infant, in need of long-term commitment of care because of altriciality8, was selected to produce fitness signals that would serve (unintentionally) to solicit such care and long-term commitment9,10. Among the many ways that fitness could be signaled, the vocal channel appears to have come under special selection pressure. We propose that the evolution of flexible vocalization became both a basis for caregiver selection of infants especially worthy of investment and a basis for caregiver-infant bonding through vocal interaction.