The field of scientometrics has shown the power of citation-based clusters for literature analysis, yet this technique has barely been used for information retrieval tasks. This work evaluates the performance of citation based-clusters for information retrieval tasks. We simulated a search process using these clusters with a tree hierarchy of clusters and a cluster selection algorithm. We evaluated the task of finding the relevant documents for 25 systematic reviews. Our evaluation considered several trade-offs between recall and precision for the cluster selection, and we also replicated the Boolean queries self-reported by the systematic review to serve as a reference. We found that citation-based clusters search performance is highly variable and unpredictable, that it works best for users that prefer recall over precision at a ratio between 2 and 8, and that when used along with query-based search they complement each other, including finding new relevant documents.
翻译:科学测量学领域显示了以引用为基础的分类组合进行文献分析的力量,然而,这一技术几乎没有用于信息检索任务。这项工作评估了以引用为基础的分类组合进行信息检索任务的业绩。我们模拟了利用这些分类组合进行搜索的过程,按组群的树级排序和群集选择算法进行搜索。我们评估了为25项系统审查寻找相关文件的任务。我们的评估考虑了集群选择在召回和精确度之间的若干取舍,我们还复制了系统审查自行报告的布尔兰查询,作为参考。我们发现,基于引用的分类群搜索性能变化很大,而且不可预测,对于愿意以2至8之间的比例回顾精确度的用户来说,它最有效,在使用时,与基于查询的搜索一起,它们相互补充,包括寻找新的相关文件。