A system may identify the location of objects of interest in a captured image by processing image data associated with the captured image using neural networks. The image data may be generated by an image sensor, which may be part of an imaging system. A cascade segmentation artificial intelligence that includes multiple neural networks may be used to process the image data in order to determine the locations objects of interest in the captured image. Post-processing may be performed on outputs of the cascade segmentation artificial intelligence to generate a mask corresponding to the locations of the objects of interest. The mask may be superimposed over the captured image to produce an output image, which may then be presented on a display.