Examples are disclosed that relate to a head-mounted device configured to perform virtual echolocation. The head-mounted device is configured to cast an array of rays at specified angles from a position derived from a pose of the head-mounted device in a physical environment, identify a plurality of intersection points of the rays with a virtual model of the physical environment, for each identified intersection point, modify an audio signal based on a head-related transfer function corresponding to the intersection point to produce a plurality of spatialized audio signals, for each spatialized audio signal, determine a time-of-flight adjustment based upon a distance between the corresponding intersection point and the position from which the rays were cast, and output each spatialized audio signal to one or more speakers with a delay based on the time-of-flight adjustment.