An image processing apparatus includes: a basic shape matching section that extracts, as a structure region, a predetermined structural object included in an image obtained by picking up an image of a mucosal surface of a living body, and matches each of regions resulting from the structure region being divided, the regions each including at least one pixel, with a first region having a first basic shape or a second region having a second basic shape; a feature value calculating section that sequentially sets regions of interest from among the regions matched by the basic shape matching section, and calculates counts of the first regions and the second regions adjacent to each of the regions of interest; and a classification section that classifies the structure region based on a result of the calculation by the feature value calculating section.