An image processing apparatus includes a special light image acquisition unit that acquires a special light image having information in a specific wavelength band, a generation unit that generates depth information at a predetermined position in a living body using the special light image, and a detection unit that detects a predetermined region using the depth information. The image processing apparatus further includes a normal light image acquisition unit that acquires a normal light image having information in a wavelength band of white light. The specific wavelength band is, for example, infrared light. The generation unit calculates a difference in a depth direction between the special light image and the normal light image to generate depth information at the predetermined position. The detection unit detects a position in which the depth information is a predetermined threshold or more as a bleeding point. The present technology is applicable to an endoscope.