Data augmentation methods for Natural Language Processing tasks are explored in recent years, however they are limited and it is hard to capture the diversity on sentence level. Besides, it is not always possible to perform data augmentation on supervised tasks. To address those problems, we propose a neural data augmentation method, which is a combination of Conditional Variational Autoencoder and encoder-decoder Transformer model. While encoding and decoding the input sentence, our model captures the syntactic and semantic representation of the input language with its class condition. Following the developments in the past years on pre-trained language models, we train and evaluate our models on several benchmarks to strengthen the downstream tasks. We compare our method with 3 different augmentation techniques. The presented results show that, our model increases the performance of current models compared to other data augmentation techniques with a small amount of computation power.
翻译:近些年来,对自然语言处理任务的数据增强方法进行了探讨,但是这些方法有限,很难在刑期层次上捕捉到多样性。此外,在监督任务方面,我们并不总是能够执行数据增强方法。为了解决这些问题,我们建议采用神经数据增强方法,这是有条件变换自动编码器和编码解码器变异器模型的组合。在对输入句进行编码和解码的同时,我们的模型捕捉了输入语言与其等级条件的合成和语义表达。在过去几年中,在经过预先培训的语言模型的发展之后,我们培训和评估了几个基准的模型,以加强下游任务。我们将我们的方法与三种不同的增强技术进行了比较。介绍的结果显示,我们的模型与其他数据增强技术相比,与其他少量的计算能力相比,增加了当前模型的性能。