The publication rates are skyrocketing across many fields of science, and it is difficult to stay up to date with the latest research. This makes automatically summarizing the latest findings and helping scholars to synthesize related work in a given area an attractive research objective. In this paper we study the problem of citation text generation, where given a set of cited papers and citing context the model should generate a citation text. While citation text generation has been tackled in prior work, existing studies use different datasets and task definitions, which makes it hard to study citation text generation systematically. To address this, we propose CiteBench: a benchmark for citation text generation that unifies the previous datasets and enables standardized evaluation of citation text generation models across task settings and domains. Using the new benchmark, we investigate the performance of multiple strong baselines, test their transferability between the datasets, and deliver new insights into task definition and evaluation to guide the future research in citation text generation. We make CiteBench publicly available at https://github.com/UKPLab/citebench.
翻译:在许多科学领域,出版率飞涨,难以跟上最新研究的步伐。这使得自动总结最新调查结果和帮助学者综合特定领域相关工作成为具有吸引力的研究目标。在本文件中,我们研究引文生成问题,根据一组引文,并引用了该模型的背景,应当产生引文。虽然在先前的工作中已经处理过引文生成问题,但现有研究使用不同的数据集和任务定义,这使得难以系统地研究引文生成。为此,我们提议CiteBench:引文生成基准,将以前的数据集统一起来,并能够标准化地评价跨任务场合和领域的引文生成模型。我们利用新的基准,调查多个强基准的性能,测试数据集之间的可转移性,并对任务定义和评估提供新的见解,以指导今后对引文生成的研究。我们让CiteBench在https://github.com/UKPLab/citebench上公开查阅。