When directly using existing text generation datasets for controllable generation, we are facing the problem of not having the domain knowledge and thus the aspects that could be controlled are limited. A typical example is when using CNN/Daily Mail dataset for controllable text summarization, there is no guided information on the emphasis of summary sentences. A more useful text generator should leverage both the input text and the control signal to guide the generation, which can only be built with a deep understanding of the domain knowledge. Motivated by this vision, our paper introduces a new text generation dataset, named MReD. Our new dataset consists of 7,089 meta-reviews and all its 45k meta-review sentences are manually annotated with one of the 9 carefully defined categories, including abstract, strength, decision, etc. We present experimental results on start-of-the-art summarization models, and propose methods for structure-controlled generation with both extractive and abstractive models using our annotated data. By exploring various settings and analyzing the model behavior with respect to the control signal, we demonstrate the challenges of our proposed task and the values of our dataset MReD. Meanwhile, MReD also allows us to have a better understanding of the meta-review domain.
翻译:当直接将现有的文本生成数据集用于可控的生成时,我们正面临着没有域知识的问题,因此可以控制的方面是有限的。典型的例子之一是在使用CNN/Daily Mail数据集进行可控的文本总和时,没有关于摘要句重点的指导信息。一个更有用的文本生成器应该利用输入文本和控制信号来指导生成,而这只能以对域知识的深刻理解来构建。受这一愿景的驱动,我们的文件引入了一个新的文本生成数据集,名为MRED。我们的新数据集由7 089个元审查及其所有45公里的元审查句组成,用9个精心定义的类别之一手工附加说明,其中包括抽象、强度、决定等。我们介绍了关于开始的艺术组合模型的实验结果,并提出了使用我们附加说明的数据的采掘和抽象模型进行结构控制生成的方法。通过探索各种设置和分析控制信号的模型行为,我们展示了我们拟议的任务的挑战以及我们数据集MRED域值。