Light-field data is masked to identify regions of interest, before applying to deep learning models. In one approach, a light-field camera captures a light-field image of an object to be classified. The light-field image includes a plurality of views of the object taken simultaneously from different viewpoints. The light-field image is pre-processed, with the resulting data provided as input to a deep learning model. The pre-processing includes determining and then applying masks to select regions of interest within the light-field data. In this way, less relevant data can be excluded from the deep learning model. Based on the masked data, the deep learning model produces a decision classifying the object.