Described herein are systems and methods for intelligently combining medical findings received across different modalities. The system comprises an extraction module extracting contextual information from an image of an area of interest including annotations, a feature selection module building a current feature vector using the extracted contextual information and the annotations, and a referencing engine computing a similarity score between the current feature vector and a prior feature vector of a prior image. The method comprises extracting contextual information from an image of an area of interest including annotations, building a current feature vector using the extracted contextual information and the annotations, and computing a similarity score between the current feature vector and a prior feature vector of a prior image.