The image processing device according to the present invention has: a region-of-interest detection unit for detecting a region of interest for each of a plurality of observation images obtained by capturing an image of a subject, the plurality of observation images being inputted in sequence to the region-of-interest detection unit; a recording unit for sequentially recording the plurality of observation images as recorded images in a first period from a first detection start at which detection of a first region of interest is started until a first detection stoppage at which detection of the first region of interest is stopped, or a second period from the first detection start until a second detection stoppage at which detection of a second region of interest is stopped; a calculation unit for calculating a display timing at which reproduction of the plurality of recording images is started on the basis of the time of at least one of the first detection start, the first detection stoppage, the second detection start, and the second detection stoppage; and a display control unit for performing processing for causing at least one recording image among the recording images recorded by the recording unit to be displayed on a display screen of a display device while causing the plurality of observation images to be sequentially displayed on the display screen at the display timing.Le dispositif de traitement d'image selon la présente invention comprend : une unité de détection de région d'intérêt destinée à détecter une région d'intérêt pour chacune d'une pluralité d'images d'observation obtenues par la capture d'une image d'un sujet, la pluralité d'images d'observation étant entrées en séquence dans l'unité de détection de région d'intérêt ; une unité d'enregistrement destinée à enregistrer séquentiellement la pluralité d'images d'observation sous la forme d'images enregistrées dans une première période à partir d'un premier début de détection auquel la détection d'une