In dialogue people do not just produce utterances, but they work together to make sure that their contributions are understood. How do they do that? Prior work has pointed to two central phenomena: cross-speaker repetition of behaviour (alignment) and clarification requests (other-initiated repair). This thesis aims to investigate the multimodal character of these phenomena, using quantitative and qualitative analyses of task-based interactions with novel 3D objects (called “Fribbles”). The findings show that people deploy speech and gestures in flexible ways as part of alignment and other-initiated repair sequences, forming effective multimodal strategies to negotiate mutual understanding. Furthermore, when people work together to resolve interactional trouble, they distribute the multimodal labour across repair initiations and repair solutions in predictable ways, thereby minimizing collaborative efforts at the level of the dyad. Together these findings show that social interaction is a form of joint action, where people reach their shared goal of mutual understanding through collaborative and multimodal language use.