An image processing apparatus includes an image acquisition section that acquires a plurality of images that differ in in-focus state, a reference point setting section that performs a reference point setting process on each of the plurality of images, the reference point setting process setting a reference point that is set to an attention area, a distance estimation section that estimates distance information about a distance to a corresponding point based on a pixel value corresponding to the reference point, the corresponding point being a point in real space that corresponds to the reference point, and an additional information generation section that generates additional information based on the estimated distance information, the additional information being information that is added to the attention area to which the reference point is set.