In an endoscope system, an insertion amount of an insertion unit is detected based on camera images captured by two cameras provided to a mouthpiece. Then, past images of a predetermined range corresponding to the detected insertion amount are acquired from a past image storage unit. A current image is compared with each of the acquired past images to calculate similarity between the current image and each of the past images. A body part captured in the past image having the highest similarity with the current image is determined to be the body part captured in the current image.