A device with an environmental sound mixing function includes a case, an input jack, an output jack, a microphone, a mixing module and an output detecting module. The case includes a hole. The input jack and the output jack are disposed on the case. The microphone is disposed in the case and aligned with the hole. The mixing module is electrically connected to the input jack, the output jack and the microphone. The microphone receives an environmental sound through the hole and converts the environmental sound into a microphone signal. The mixing module receives an audio signal through the input jack, mixes the audio signal with the microphone signal to generate a mixed signal, and outputs the mixed signal through the output jack. When the output detecting module detects that headphones are connected to the output jack, the output detecting module drives the mixing module to start operating.