Provided is an image processing device including an illuminating portion that irradiates a subject with illumination light and excitation light; a fluorescence image-acquisition portion that acquires a fluorescence image by capturing fluorescence generated at the subject; a return-light image-acquisition portion that acquires a return-light image by capturing return light returning from the subject; a color-image generating portion that generates a plurality of color images by adding different types of color information that constitute a color space to the acquired fluorescence image and return-light image; and an image combining portion that combines the plurality of color images that have been generated, wherein at least one of the fluorescence image and the return-light image is subjected to, by the color-image generating portion), correction processing in which exponents for distance characteristics, which are approximated to exponential functions, for the fluorescence image and the return-light image are matched with each other.