This paper presents a new data augmentation algorithm for natural understanding tasks, called RPN:Random Position Noise algorithm.Due to the relative paucity of current text augmentation methods. Few of the extant methods apply to natural language understanding tasks for all sentence-level tasks.RPN applies the traditional augmentation on the original text to the word vector level. The RPN algorithm makes a substitution in one or several dimensions of some word vectors. As a result, the RPN can introduce a certain degree of perturbation to the sample and can adjust the range of perturbation on different tasks. The augmented samples are then used to give the model training.This makes the model more robust. In subsequent experiments, we found that adding RPN to the training or fine-tuning model resulted in a stable boost on all 8 natural language processing tasks, including TweetEval, CoLA, and SST-2 datasets, and more significant improvements than other data augmentation algorithms.The RPN algorithm applies to all sentence-level tasks for language understanding and is used in any deep learning model with a word embedding layer.
翻译:本文为自然理解任务提供了一个新的数据增强算法, 称为 RPN: 兰多姆位置噪音算法。 由于当前文本增强方法相对稀少, 很少有现有方法适用于所有句级任务的自然语言理解任务 。 RPN 将原始文本的传统增强值应用到文字矢量级别 。 RPN 算法在某些字矢量的一个或几个维度上取代了某些字矢量。 因此, RPN 可以对样本引入某种程度的扰动, 并可以调整不同任务上扰动的范围。 增强的样本然后用于提供模型培训。 这样使模型更加坚固。 在随后的实验中, 我们发现, 将 RPN 添加到培训或微调模型中, 导致所有8个自然语言处理任务, 包括 TweetEval、 CoLA 和 SST-2 数据集, 以及比其他数据增强算法更重大的改进 。 RPN 算法适用于所有句级任务, 用于语言理解, 并用于任何带有单词嵌嵌层的深学习模型 。