Disclosed herein is method and system for determining quality of semen sample. Trajectories of objects, identified in each of plurality of image frames of semen sample, are generated by tracking movement of the objects across image frames, and compensating a drift velocity of the semen sample. Further, generated trajectories are classified into sperm and non-sperm trajectories. Finally, total concentration estimate and total motility estimate of the semen sample are computed to generate a semen quality index, which indicates quality of the semen sample. In an embodiment, the method of present disclosure uses a multi-level Convolutional Neural Network (CNN) analysis technique for effectively classifying the object trajectories into sperm and non-sperm objects. Also, since the present method includes estimating and compensating drift velocity in the semen sample, it enhances overall accuracy of motility estimation and semen quality analysis.