An image processing device includes an image sequence acquisition section that acquires an input image sequence that includes first to N-th images, and a processing section that performs an image summarization process that deletes some of the first to Nth images to generate a summary image sequence, the processing section selecting an s-th (s is an integer that satisfies 0≦s≦N+1) image to be a provisional summary image, selecting a t-th (t is an integer that satisfies 0≦t≦s−1) image to be a provisional preceding summary image, selecting a u-th (u is an integer that satisfies t