A system having one or more processors and a memory, receives both speech data from first and second participants of a session. The system outputs the speech of the first participant. The system outputs the speech of the second participant in accordance with an adjustment of the speech of a participant of the session when the speech of the second participant temporally overlaps less than a first predetermined threshold amount of a terminal portion of the speech of the first participant. The system drops the speech of the second participant when the speech of the second participant temporally overlaps more than the first predetermined threshold amount of the terminal portion of the speech of the first participant. Optionally, the system adjusts the speech of a participant of the session by delaying output of the speech of the second participant.