Techniques for controlling a robotic device in motion using image synthesis are presented. A method includes determining local motion features based on image patches included in first and second input images captured by a camera installed on the robotic device; determining a camera motion based on the local motion features, wherein the camera motion indicates a movement of the camera between capture of the first input image and capture of the second input image; determining a scene geometry based on the camera motion, wherein the scene geometry is a set of three-dimensional coordinates corresponding to two-dimensional image coordinates of the image patches; generating a single perspective synthesized image based on the camera motion and the scene geometry; detecting a change between the first and second input images based on the synthesized image; and modifying motion of the robotic device based on the detected at least one change.