A sound source-separating device includes a sound-collecting part, an imaging part, a sound signal-evaluating part, an image signal-evaluating part, a selection part that selects whether to estimate a sound source direction based on the first sound signal or the first image signal, a person position-estimating part that estimates a sound source direction using the first image signal, a sound source direction-estimating part that estimates a sound source direction, a sound source-separating part that extracts a second sound signal corresponding to the sound source direction from the first sound signal, an image-extracting part that extracts a second image signal of an area corresponding to the estimated sound source direction from the first image signal, and an image-combining part that changes a third image signal of an area other than the area for the second image signal and combines the third image signal with the second image signal.