Using generated data to improve the performance of downstream discriminative models has recently gained popularity due to the great development of pre-trained language models. In most previous studies, generative models and discriminative models are trained separately and thus could not adapt to any changes in each other. As a result, the generated samples can easily deviate from the real data distribution, while the improvement of the discriminative model quickly reaches saturation. Generative adversarial networks (GANs) train generative models via an adversarial process with discriminative models to achieve joint training. However, the training of standard GANs is notoriously unstable and often falls short of convergence. In this paper, to address these issues, we propose a $\textit{self-consistent learning}$ framework, in which a discriminator and a generator are cooperatively trained in a closed-loop form. The discriminator and the generator enhance each other during multiple rounds of alternating training until a scoring consensus is reached. This framework proves to be easy to train and free from instabilities such as mode collapse and non-convergence. Extensive experiments on sentence semantic matching demonstrate the effectiveness of the proposed framework: the discriminator achieves 10+ AP of improvement on the zero-shot setting and new state-of-the-art performance on the full-data setting.
翻译:最近,利用生成数据来提高下游判别模型的性能已经变得越来越流行,这要归功于预训练语言模型的巨大发展。在大多数以前的研究中,生成模型和判别模型是分别训练的,因此它们不能适应彼此的任何变化。结果,生成的样本很容易偏离真实数据分布,而判别模型的改进很快达到饱和。生成对抗网络(GAN)通过判别模型的对抗过程训练生成模型,以实现联合训练。然而,标准GAN的训练非常不稳定,经常无法收敛。在本文中,为了解决这些问题,我们提出了一个“自我一致学习”框架,其中辨别器和生成器以闭环形式合作训练。通过多轮交替训练,辨别器和生成器彼此增强,直至达到评分共识。该框架易于训练,并且不容易出现像模式坍塌和不收敛等不稳定性。在句子语义匹配方面进行的大量实验证明了所提出框架的有效性:辨别器在零样本方案上改进了10+个平均准确率,并在全数据方案上达到了新的最佳性能。