Implementations generally relate to detecting dominant tools in surgical videos. In some implementations, a method includes receiving at least one image frame. The method further includes detecting one or more objects in the at least one image frame. The method further includes classifying the one or more objects into one or more tool classifications, where the one or more objects are tools. The method further includes determining a handedness of the one or more tools. The method further includes determining a dominant tool from the one or more tools based at least in part on the one or more classifications of the one or more tools and based at least in part on the handedness of the one or more tools.