We propose a method to measure the similarity of papers and authors by simulating a literature search procedure on citation networks, which is an information retrieval inspired conceptualization of similarity. This transition probability (TP) based approach does not require a curated classification system, avoids clustering complications, and provides a continuous measure of similarity. We perform testing scenarios to explore several versions of the general TP concept and the Node2vec machine-learning technique. We found that TP measures outperform Node2vec in mapping the macroscopic structure of fields. The paper provides a general discussion of how to implement TP similarity measurement, with a particular focus on how to utilize publication-level information to approximate the research interest similarity of individual scientists. This paper is accompanied by a Python package capable of calculating all the tested metrics.
翻译:暂无翻译