Appearance learning systems, methods and computer products for three-dimensional markerless tracking of robotic surgical tools. An appearance learning approach is provided that is used to detect and track surgical robotic tools in laparoscopic sequences. By training a robust visual feature descriptor on low-level landmark features, a framework is built for fusing robot kinematics and 3D visual observations to track surgical tools over long periods of time across various types of environments. Three-dimensional tracking is enabled on multiple tools of multiple types with different overall appearances. The presently disclosed subject matter is applicable to surgical robot systems such as the da Vinci® surgical robot in both ex vivo and in vivo environments.