Provided is an image forming method of forming an object image by combining a plurality of tomographic images acquired by using an optical coherence tomographic method, including: acquiring, within a first predetermined period, a first three-dimensional image of a first area including a characteristic portion of the object and first tomographic images as a part of the plurality of tomographic images of a second area different from the first area; acquiring, within a second predetermined period, a second three-dimensional image of the first area and second tomographic images as a part of the plurality of tomographic images of the second area, the second tomographic images being different from the first tomographic images; and aligning positions of the first tomographic images and the second tomographic images by using, as references, the characteristic portion included in the first three-dimensional image and the characteristic portion included in the second three-dimensional image.