In an example, a computer-implemented method receives one or more user inputs and captures a sound associated with a sound source via one or more capturing devices using sound source localization. The method then estimates one or more first posterior likelihoods of one or more positions of the sound source based on the one or more user inputs and a second posterior likelihood of a position of the sound source based on the sound. The method then estimates an overall posterior likelihood of an actual position of the sound source based on 1) the one or more first posterior likelihoods of the one or more positions of the sound source estimated based on the one or more user inputs and 2) the second posterior likelihood of the position of the sound source estimated based on the sound.