An image sensor in a first earcup captures an image of a pinna. First sound is output by a transducer in a second earcup located at the pinna and respective second sound is detected by each of one or more microphones in the second earcup located at the pinna. Based on the captured image and the respective second audio sound from each of the one or more microphones, a non-linear transfer function is determined which characterizes how sound is transformed by the pinna. A signal is generated indicative of one or more audio cues for spatializing third sound based on the determined non-linear transfer function.