An image processing apparatus is configured to: perform averaging processing on pixel values of pixels having different color filters to obtain a signal value, and generate motion detection images based on the signal value in such a way that, in WLI, a weight of a pixel value for a filter for passing light of a luminance component of a captured image in WLI is set to be larger than or equal to a weight of a pixel value for a different filter while in NBI, a weight of a pixel value for a filter for passing light of a luminance component of a captured image in NBI is set to be larger than or equal to a weight of a pixel value for a different filter; and detect motion between two of the motion detection images generated based on the captured images at different points in time.