A first image and a second image obtained by imaging the same subject with different types of modalities are obtained. The first image is deformed, and similarity between the deformed first image and the second image is evaluated by an evaluation function that evaluates correlation between distributions of corresponding pixel values of the two images to estimate an image deformation amount of the first image. Based on the estimated image deformation amount, a deformed image of the first image is generated. The evaluation function includes a term representing a measure of correlation between a pixel value of the deformed first image and a corresponding pixel value of the second image, wherein the term evaluates the correlation based on probability information that indicates a probability of each combination of corresponding pixel values of the first image and the second image.