Adversarial regularization can improve model generalization in many natural language processing tasks. However, conventional approaches are computationally expensive since they need to generate a perturbation for each sample in each epoch. We propose a new adversarial regularization method ARCH (adversarial regularization with caching), where perturbations are generated and cached once every several epochs. As caching all the perturbations imposes memory usage concerns, we adopt a K-nearest neighbors-based strategy to tackle this issue. The strategy only requires caching a small amount of perturbations, without introducing additional training time. We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks. We observe that ARCH significantly eases the computational burden (saves up to 70\% of computational time in comparison with conventional approaches). More surprisingly, by reducing the variance of stochastic gradients, ARCH produces a notably better (in most of the tasks) or comparable model generalization. Our code is publicly available.
翻译:自动正规化可以改善许多自然语言处理任务的典型概括化。 但是,常规方法在计算上成本很高,因为它们需要为每个时代的每个样本产生扰动。 我们建议采用新的对抗性正规化法ARCH(对抗性正规化,用缓冲法),每几个时代就产生一次扰动并缓存一次。随着所有扰动的不断累积,都会引起记忆使用问题,我们采取了K型邻国战略来解决这一问题。这个战略只需要在不引入额外培训时间的情况下进行少量扰动即可。我们评估了一套神经机器翻译和自然语言理解任务的拟议方法。我们发现,ARCH大大减轻了计算负担(比常规方法节省了70 ⁇ 计算时间 ) 。 更令人惊讶的是,由于缩小了随机梯度的差异,ARCH产生了显著的更好(大部分任务)或可比的模型概括化。我们的代码是公开的。