A method of processing images captured using an in vivo capsule camera is disclosed. Input images captured by the in vivo capsule camera are received and used as to-be-processed images. At least one locally-deformed stitched image is generated by applying local deformation to image areas in a vicinity of a seam between two to-be-processed images and stitching the two locally deformed to-be-processed images. Output images including the at least one locally-deformed stitched image are provided for display or further processing. The process to generate at least one locally-deformed stitched image may comprise identifying an optimal seam between the two to-be-processed images and applying the local deformation to the image areas in the vicinity of the optimal seam. The process of identifying the optimal seam comprises minimizing differences of an object function across the optimal seam. The object function may correspond to image intensity or derivative of the image intensity.