Our ability to use language is multimodal and requires tight coordination between what is expressed in speech and in gesture, such as pointing or iconic gestures that convey semantic, syntactic and pragmatic information related to speakers’ messages. Interestingly, what is expressed in gesture and how it is coordinated with speech differs in speakers of different languages. This paper discusses recent findings on the development of children’s multimodal expressions taking cross-linguistic variation into account. Although some aspects of speech-gesture development show language-specificity from an early age, it might still take children until nine years of age to exhibit fully adult patterns of cross-linguistic variation. These findings reveal insights about how children coordinate different levels of representations given that their development is constrained by patterns that are specific to their languages.