An image processing device, an endoscope system, a program, an image processing method, and the like that prevent a situation in which an attention area is missed, and make it possible to reliably specify an attention area, are disclosed. An image processing device has a first image acquisition section that acquires a first image, the first image being an image that has information within a wavelength band of white light, a second image acquisition section that acquires a second image, the second image being an image that has information within a specific wavelength band, an attention area detection section that detects an attention area within the second image based on a feature quantity of each pixel within the second image, a display state setting section that performs a display state setting process that sets a display state of a display image generated based on the first image, and a designated elapsed time setting section that performs a designated elapsed time setting process that sets a designated elapsed time based on a detection result for the attention area. The display state setting section performs the display state setting process based on the designated elapsed time that has been set by the designated elapsed time setting section.