In one embodiment, a computing device can detect an utterance of a target phrase within an acoustic input signal. The computing device can further determine a first estimate of cumulative signal and noise energy for the detected utterance in the acoustic input signal with respect to a first time period spanning the duration of the detected utterance, and a second estimate of noise energy in the acoustic input signal with respect to a second time period preceding (or following) the first time period. The computing device can then calculate a signal-to-noise ratio (SNR) for the detected utterance based on the first and second estimates and can reject the detected utterance if the SNR is below an SNR threshold.