Recent studies have identified neural correlates of language effects on perception in static domains of experience such as colour and objects. The generalization of such effects to dynamic domains like motion events remains elusive. Here, we focus on grammatical differences between languages relevant for the description of motion events and their impact on visual scene perception. Two groups of native speakers of German or English were presented with animated videos featuring a dot travelling along a trajectory towards a geometrical shape (endpoint). English is a language with grammatical aspect in which attention is drawn to trajectory and endpoint of motion events equally. German, in contrast, is a non-aspect language which highlights endpoints. We tested the comparative perceptual saliency of trajectory and endpoint of motion events by presenting motion event animations (primes) followed by a picture symbolising the event (target): In 75% of trials, the animation was followed by a mismatching picture (both trajectory and endpoint were different); in 10% of trials, only the trajectory depicted in the picture matched the prime; in 10% of trials, only the endpoint matched the prime; and in 5% of trials both trajectory and endpoint were matching, which was the condition requiring a response from the participant. In Experiment 1 we recorded event-related brain potentials elicited by the picture in native speakers of German and native speakers of English. German participants exhibited a larger P3 wave in the endpoint match than the trajectory match condition, whereas English speakers showed no P3 amplitude difference between conditions. In Experiment 2 participants performed a behavioural motion matching task using the same stimuli as those used in Experiment 1. German and English participants did not differ in response times showing that motion event verbalisation cannot readily account for the difference in P3 amplitude found in the first experiment. We argue that, even in a non-verbal context, the grammatical properties of the native language and associated sentence-level patterns of event encoding influence motion event perception, such that attention is automatically drawn towards aspects highlighted by the grammar.