A first 3D image and a second 3D image imaged a target organ in different phases of respiration are acquired. A 3D deformation model of the target organ which is stored in advance and represents nonlinear 3D deformation of the target organ due to respiration, and which has been generated based on information about movement of the target organ due to respiration of plural patients, is read. The positions of pixels on the second 3D image representing the same positions on the target organ as plural sampled pixels in a target organ region on the first 3D image are estimated using displacement due to changes in phase of points on the 3D deformation model corresponding to the positions on the target organ represented by the pixels. Non-rigid alignment is performed between the first 3D image and the second 3D image using the estimated positions of the pixels.