Provided is a process including: obtaining a first graph comprising nodes and edges, each of the first-graph edges linking two of the first-graph nodes and denoting semantic similarity of unstructured text in documents corresponding to the two linked first-graph nodes; for each of the first-graph nodes, selecting nodes for a second graph from attributes of the unstructured text documents to which the first-graph node corresponds, wherein the attributes are entities mentioned in the unstructured text documents, and wherein each of the second-graph nodes corresponds to a respective selected attribute; and for each pair of the second-graph nodes, determining a respective edge weight indicating similarity between a first entity corresponding to a first node of the respective pair and a second entity corresponding to a second node of the respective pair.