Various aspects of a method and system to localize surgical tools during anatomical surgery are disclosed herein. In accordance with an embodiment of the disclosure, the method is implementable in an image-processing engine, which is communicatively coupled to an image-capturing device that captures one or more video frames. The method includes determination of one or more physical characteristics of one or more surgical tools present in the one or more video frames, based on one or more color and geometric constraints. Thereafter, two-dimensional (2D) masks of the one or more surgical tools are detected, based on the one or more physical characteristics of the one or more surgical tools. Further, poses of the one or more surgical tools are estimated, when the 2D masks of the one or more surgical tools are occluded at tips and/or ends of the one or more surgical tools.