An image processing apparatus according to the present invention includes a first feature value calculation unit adapted to calculate a first feature value for each pixel in an image picked up of living tissue, where the first feature value represents a value of an index which indicates what shape a local region including a pixel of interest and each pixel in a neighborhood of the pixel of interest has; a second feature value calculation unit adapted to calculate a second feature value for each pixel in the image, where the second feature value represents a state of distribution of the gradient direction in the local region; an evaluation value calculation unit adapted to calculate a geometric evaluation value for enabling to distinguish a structure of a predetermined shape contained in the image for each pixel based on calculation results of the first feature value and the second feature value; and a region extraction unit adapted to extract, from the image, a candidate region estimated to contain the structure of the predetermined shape based on a calculation result of the geometric evaluation value.