A three-dimensional endoscope comprises a video signal input portion to which a first video signal is obtained by a first imaging system and a second video signal is obtained by a second imaging system are inputted. A video signal identification portion that identifies a two-dimensional video signal and a three-dimensional video signal are obtained from the video signal input portion. An image condition detection portion that when the video signal identification portion has detected the two-dimensional video signal, analyzes a display area of a two-dimensional image to detect a foggy region. An image combining portion that, when the image condition detection portion has detected a foggy region in a video of at least one of the first imaging system and the second imaging system, combines both of the first video signal and the second video signal to generate a composite image in which fogginess of the foggy region being eliminated.