Systems and methods are presented for assisting in providing consistent alignment of a handheld intra-oral imaging device for a series of images. Live image data of the patient is received from the intra-oral image capture device and displayed on the display. A previously stored intra-oral image of the patient is accessed from the non-transitory memory and an alignment mask is generated based on the accessed previously stored intra-oral image. The system determines whether the live image data is aligned with the alignment mask. The system then automatically captures a new intra-oral image of the patient from the live image data in response to determining that the live image data is aligned with the alignment mask and stores the new intra-oral image to the non-transitory memory, displays the new intra-oral image on the display, or both.