To facilitate understanding of correspondence between a motion contrast image acquired through OCTA and structure information acquired through OCT or the like, when an image is displayed, a first image within a first area of an object to be inspected is acquired, and interference signal sets corresponding to a plurality of frames, which are acquired with an intention to acquire the same cross section, are acquired for a plurality of different cross sections. Then, a motion contrast image within a second area included in the first area is generated based on the interference signal sets corresponding to the plurality of frames, and information acquired from a part of the motion contrast image is superimposed for display onto a corresponding position of the first image.