Methods and systems for virtual coaching and performance training using a mobile device are disclosed. The methods and systems perform the steps of capturing a training video of a player using a camera on the mobile device; augmenting the training video with a visual cue for a cue period starting from a first time instant; determining whether the player has responded to the visual cue at a second time instant within the cue period, by analyzing a body posture flow of the player between the first time instant and the second time instant, wherein the body posture flow is extracted from the training video by performing a computer vision algorithm on one or more frames of the training video; and in response to determining that the player has responded to the visual cue, generating a feedback to the player.