An image processing apparatus includes a tomographic signal obtaining unit that obtains tomographic signals of light beams, the light beams respectively having a different polarization from each other and being obtained by splitting combined light obtained by combining return light and reference light, the return light being from an object to be inspected that is irradiated with measuring light, and an information obtaining unit that obtains three-dimensional polarization tomographic information, and three-dimensional motion contrast information of the object to be inspected, by commonly using at least one of the obtained tomographic signals. In addition, an extracting unit extracts a specific region of the object to be inspected using the obtained three-dimensional polarization tomographic information, and an image generating unit generates a motion contrast enface image of the extracted specific region using the obtained three-dimensional motion contrast information.