This paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We conduct experiments to show that these three techniques can help improve abstractive summarization across two summarization models and two different small datasets. Furthermore, we show that these techniques can improve performance when applied in isolation and when combined.
翻译:本文探讨了三种简单的数据处理技术(合成、增强、课程),用于改进抽象汇总模型,而不需要任何补充数据。我们采用了一种数据合成方法,与参数合成、数据增强技术与样本混合,以及课程学习,采用基于具体性和抽象性的两个新的困难度量标准。我们进行实验,以表明这三种技术可以帮助改进两个汇总模型和两个不同的小数据集的抽象汇总。此外,我们证明这些技术在孤立和合并应用时可以提高性能。