An image processing apparatus comprising, boundary extraction means for detecting boundaries of retina layers from a tomogram of an eye to be examined, exudate extraction means for extracting an exudate region from a fundus image of the eye to be examined, registration means for performing registration between the tomogram and the fundus image, and calculating a spatial correspondence between the tomogram and the fundus image, specifying means for specifying a region where an exudate exists in the tomogram using the boundaries of the retina layers, the exudate region, and the spatial correspondence, likelihood calculation means for calculating likelihoods of existence of the exudate in association with the specified region, and tomogram exudate extraction means for extracting an exudate region in the tomogram from the specified region using the likelihoods.