The tracking of entities in discourse is known to be a bimodal phenomenon. Speakers achieve cohesion in speech by alternating between full lexical forms, pronouns, and zero anaphora as they track referents. They also track referents in co-speech gestures. In this study, we explored how viewpoint is deployed in reference tracking, focusing on representations of animate entities in German narrative discourse. We found that gestural viewpoint systematically varies depending on discourse context. Speakers predominantly use character viewpoint in maintained contexts and observer viewpoint in reintroduced contexts. Thus, gestural viewpoint seems to function as a cohesive device in narrative discourse. The findings expand on and provide further evidence for the coordination between speech and gesture on the discourse level that is crucial to understanding the tight link between the two modalities.