Machine-generated citation sentences can aid automated scientific literature review and assist article writing. Current methods in generating citation text were limited to single citation generation using the citing document and a cited document as input. However, in real-world situations, writers often summarize several studies in one sentence or discuss relevant information across the entire paragraph. In addition, multiple citation intents have been previously identified, implying that writers may need control over the intents of generated sentences to cover different scenarios. Therefore, this work focuses on generating multiple citations and releasing a newly collected dataset named CiteMI to drive the future research. We first build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs. Second, we incorporate the predicted citation intents into training for intent control. The experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
翻译:目前生成引证文本的方法仅限于单一的引证生成,使用引证文件和引用的文件作为投入。然而,在现实世界中,作者往往将若干研究总结为一个句子,或在整个段落中讨论相关信息。此外,先前已经查明了多种引证意图,这意味着作者可能需要对生成的句子的意图进行控制,以涵盖不同情景。因此,这项工作的重点是生成多个引证,并发布新收集的数据集CiteMI,以驱动未来的研究。我们首先建立一个新型的代号,采用 " 十二月内聚合 " 方法应对多重长期投入。第二,我们将预测的引证意图纳入意图控制培训。实验表明,拟议的方法为生成引证句提供了更为全面的特点。