Adversarial regularization can improve model generalization in many natural language processing tasks. However, conventional approaches are computationally expensive since they need to generate a perturbation for each sample in each epoch. We propose a new adversarial regularization method ARCH (adversarial regularization with caching), where perturbations are generated and cached once every several epochs. As caching all the perturbations imposes memory usage concerns, we adopt a K-nearest neighbors-based strategy to tackle this issue. The strategy only requires caching a small amount of perturbations, without introducing additional training time. We evaluate our proposed method on a set of neural machine translation and natural language understanding tasks. We observe that ARCH significantly eases the computational burden (saves up to 70% of computational time in comparison with conventional approaches). More surprisingly, by reducing the variance of stochastic gradients, ARCH produces a notably better (in most of the tasks) or comparable model generalization. Our code is available at https://github.com/SimiaoZuo/Caching-Adv.
翻译:自动正规化可以改善许多自然语言处理任务的典型化。然而,常规方法在计算上成本很高,因为它们需要为每个时代的每个样本产生扰动。我们提议一种新的对抗性正规化方法ARCH(用缓冲进行对抗性正规化 ARCH ), 每几个时代就产生一次扰动并缓存一次。随着所有扰动过程的不断累积,都会引起记忆使用问题,我们采取了K型近邻战略来解决这一问题。这一战略只需要在不增加培训时间的情况下,先进行少量的扰动。我们评估了一套神经机器翻译和自然语言理解任务的拟议方法。我们观察到ARCH大大减轻了计算负担(比常规方法节省了高达70%的计算时间 ) 。更令人惊讶的是,由于缩小了蒸气梯度的差异,ARCH产生了显著更好的(大部分任务)或可比的模型概括化。我们的代码可以在https://github.com/Simiao/Caching-Adv查阅。