An endoscope system includes: a generating means generating a compositing mask that serves as compositing ratios of the corresponding pixels between a pair of images acquired by simultaneously imaging two optical images having different focus positions, into which a subject image is divided on the basis of the ratios of contrasts; a correcting means subjecting compositing masks generated for pairs of images acquired in time series, to weighted averaging for respective pixels, thus generating a corrected mask; and an compositing means compositing the two images according to the corrected mask. The correcting means subjects the compositing masks to weighted averaging by performing weighting such that the percentage of the past compositing masks is higher at pixels constituting a static area and an area having contrast lower than a threshold than at pixels constituting a moving-object area or an area having contrast equal to or higher than the threshold.