A video conference endpoint includes a microphone array to detect ultrasound transmitted by a user device and that is encoded with a user identity. The endpoint determines a position of the user device relative to the microphone array based on the detected ultrasound and recovers the user identity from the detected ultrasound. The microphone array also detects audio in an audio frequency range perceptible to humans from an active talker, and determines a position of the active talker relative to the microphone array based on the detected audio. The endpoint determines whether the position of the active talker and the position of the user device are within a predetermined positional range of each other. If it is determined that the positions are both within the predetermined positional range, the endpoint assigns the user identity to the active talker.