An endoscope system includes an endoscope and an image processing device attached to one another. The image processing device includes at least one processor configured to perform operations of determining an operator's action based on an action signal from an endoscope inserted into a subject body, deciding whether an image is set as a detection target image based on the operator's action and detecting a specific region from the detection target image. The processor performs an operation of determining whether the operator's action at a time of capturing the image is a treatment action to give the subject body a treatment. Furthermore, the processor detects, from the image, a region, which exhibits a specular reflection and whose time change in area and position is large, as a washed region and then determine the operator's action at the time of capturing the image is the treatment action when the washed region is detected.