The present invention concerns a method of transforming raw visual data, or 3D (depth) visual data, or an interpreted visually perceived 3D scene into acoustic signals for aiding visually impaired or blind persons, comprising the steps of capturing visual environment data by at least one vision sensor unit, wherein the at least one vision sensor unit is formed as an event-based vision sensor, transforming the captured visual environment data to acoustic signals, and outputting the acoustic signals to the visually impaired or blind person by at least an audio output unit. Another aspect of the present invention concerns an aid device that comprises means to carry out the method of the present invention.