Measuring the relatedness between scientific publications has important applications in many areas of bibliometrics and science policy. Controlled vocabularies provide a promising basis for measuring relatedness because they address issues that arise when using citation or textual similarity to measure relatedness. While several controlled-vocabulary-based relatedness measures have been developed, there exists no comprehensive and direct test of their accuracy and suitability for different types of research questions. This paper reviews existing measures, develops a new measure, and benchmarks the measures using TREC Genomics data as a ground truth of topics. The benchmark test show that the new measure and the measure proposed by Ahlgren et al. (2020) have differing strengths and weaknesses. These results inform a discussion of which method to choose when studying interdisciplinarity, information retrieval, clustering of science, and researcher topic switching.
翻译:暂无翻译