A voice processing method and apparatus, and a terminal device. The method comprises: playing an audio/a video, acquiring a voice signal input by a user, storing the voice signal input by the user as a tested audio, and suspending the playing of the audio/video (S101); acquiring audio data of the audio/video before the moment at which the playing is suspended, and storing the audio data as a standard audio (S102); comparing the tested audio with the standard audio, and obtaining a similarity between the tested audio and the standard audio (S103); and displaying the similarity between the tested audio and the standard audio for the user (S104). Further disclosed are a voice processing apparatus and a terminal device containing the voice processing apparatus. The technical solution can compare, in a timely manner, a similarity between a user pronunciation and a teaching pronunciation, correct the pronunciation, and improve self-study efficiency.