Recently introduced transformer-based article encoders (TAEs) designed to produce similar vector representations for mutually related scientific articles have demonstrated strong performance on benchmark datasets for scientific article recommendation. However, the existing benchmark datasets are predominantly focused on single domains and, in some cases, contain easy negatives in small candidate pools. Evaluating representations on such benchmarks might obscure the realistic performance of TAEs in setups with thousands of articles in candidate pools. In this work, we evaluate TAEs on large benchmarks with more challenging candidate pools. We compare the performance of TAEs with a lexical retrieval baseline model BM25 on the task of citation recommendation, where the model produces a list of recommendations for citing in a given input article. We find out that BM25 is still very competitive with the state-of-the-art neural retrievers, a finding which is surprising given the strong performance of TAEs on small benchmarks. As a remedy for the limitations of the existing benchmarks, we propose a new benchmark dataset for evaluating scientific article representations: Multi-Domain Citation Recommendation dataset (MDCR), which covers different scientific fields and contains challenging candidate pools.
翻译:最近推出的以变压器为基础的物品编码器(TAEs)旨在为相互相关的科学文章制作类似的矢量表示,显示在科学文章建议的基准数据集方面,现有基准数据集在科学文章建议的基准数据集方面表现良好,但是,现有基准数据集主要侧重于单一领域,有时在小型候选集合中含有容易的负值。对此类基准的表述可能掩盖了TAEs在设置候选集合中数千篇文章方面的现实表现。在这项工作中,我们用更具挑战性的候选人集合来评估大型基准的TAEs。我们比较了TAEs的业绩和关于引用建议任务的词汇检索基准模型BM25,该模型列出了用于引用某一投入文章的建议清单。我们发现,BM25与最新神经检索器相比仍然非常有竞争力。鉴于TAEss在小型基准上的强劲表现,这一发现令人惊讶。作为现有基准的局限性的一种补救措施,我们提出了用于评估科学文章演示的新的基准数据集:多域引用建议数据集,涵盖不同的科学领域,并载有具有挑战性的候选人群。