An endoscope apparatus includes a processor. The processor acquires motion information representing a relative motion with respect to an imaging section and an object, and determines whether or not to perform a focus operation of causing an imaging section to bring an object into focus based on the motion information. The processor obtains global motion information representing a global relative motion with respect to the imaging section and the object based on the motion information, determines global motion information reliability that is reliability of the global motion information, and determines whether or not to perform the focus operation based on two or more frame images including a first frame image corresponding to a high reliability frame before a low reliability frame and a second frame image corresponding to the high reliability frame after the low reliability frame.