Method and apparatus of reconstruction of images from an in vivo multi-camera capsule are disclosed. In one embodiment of the present invention, the capsule comprises two cameras with overlapped fields of view (FOVs). Intra-image based pose estimation is applied to the sub-images associated with the overlapped area to improve the pose estimation for the capsule device. In another embodiment, two images corresponding to the two FOVs are fused by using disparity-adjusted, linear weighted sum of the overlapped sub-images. In yet another embodiment, the images from the multi-camera capsule are stitched for time-space representation.