Provided are a scene-transition-image extraction unit (12) that extracts as a scene transition image an image where a transition of scene occurs from an image sequence captured in time series, using a predetermined extraction condition a display unit (3) that displays a near image, which is an image captured at a time in a neighborhood of a time when the scene transition image is captured an operation history acquiring unit (14) that acquires image information of an image, for which a predetermined viewing operation is performed, among the near image(s) and history information of the predetermined viewing operation and an extraction condition changing unit (15) that changes the extraction condition using the image information and the history information of the operation acquired by the operation history acquiring unit (14). The scene-transition-image extraction unit (12) re-extracts an image where a transition of scene occurs, using the extraction condition changed by the extraction condition changing unit (15).