Data augmentation is an effective way to improve the performance of many neural text generation models. However, current data augmentation methods need to define or choose proper data mapping functions that map the original samples into the augmented samples. In this work, we derive an objective to formulate the problem of data augmentation on text generation tasks without any use of augmented data constructed by specific mapping functions. Our proposed objective can be efficiently optimized and applied to popular loss functions on text generation tasks with a convergence rate guarantee. Experiments on five datasets of two text generation tasks show that our approach can approximate or even surpass popular data augmentation methods.
翻译:数据增强是改进许多神经文本生成模型的性能的有效方法,然而,目前的数据增强方法需要界定或选择适当的数据映射功能,将原始样本映射成扩大样本。在这项工作中,我们的目标是在不使用特定映射功能所构建的增强数据的情况下,就文本生成任务拟订数据增强问题。我们拟议的目标可以高效优化,并适用于在统一率保证下在文本生成任务上的流行损失功能。对两种文本生成任务的五个数据集的实验表明,我们的方法可以接近甚至超过流行的数据增强方法。