The present invention is intended to reduce an observers burden of observing a time-series image group captured in time series. In an image processing device (2) according to an embodiment of the present invention, an interest area detector (11) detects interest areas in a time-series in-vivo image group. A feature amount calculation unit (12) calculates feature amounts indicative of features of the interest areas. An area classification unit (13) classifies the interest areas into area groups, based on the feature amounts of the interest areas and time-series positions of time-series images including the interest areas. A group feature amount calculation unit (14) calculates group feature amounts indicative of features of the area groups. An area selection unit (15) selects representative areas of the area groups based on the group feature amounts. A representative image output unit (16) outputs representative images including the representative areas out of the time-series image group.