Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored for low-resource ASR corpora. GANs help to learn the true data representation through a two-player min-max game. However, training an E2E ASR model using a large ASR corpus with a GAN framework has never been explored, because it might take excessively long time due to high-variance gradient updates and face convergence issues. In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data. Since the ASR model is pre-trained, we hypothesize that the ASR model output (soft distribution vectors) helps to get higher scores from the discriminator and makes the task of the discriminator harder within our GAN framework, which in turn improves the performance of the ASR model in the fine-tuning stage. Here, the pre-trained ASR model is fine-tuned adversarially against the discriminator using an additional adversarial loss. Experiments on full LibriSpeech dataset show that our proposed approach outperforms baselines and conventional GAN-based adversarial models.
翻译:最近为低资源ASR公司探索了使用基因对抗网络(GAN)对端对端ASR系统进行ADversari(E2E)的ADVAR培训。GANs帮助通过双玩的微轴游戏学习真实的数据表示;然而,从未探索过使用GAN框架的大型ASR(ASR)程序培训E2E ASR模型,因为由于高差异梯度更新和面临趋同问题,它可能花费过长的时间。在本文件中,我们引入了一个新的框架,以便利用GAN目标对经过预先训练的ASR模型进行微调,使ASR模型成为生成者,并试图将ASR输出与真实数据区分开来。由于ASR模型是预先培训的,我们假设ASR模型(软分布矢量器)有助于提高歧视者的分数,并使歧视者的任务在我们GAN框架内更为困难,这反过来改进了ASR模型在额外调整阶段的性能。在这里,经过培训的ASR(ASR)模型是经过精细调的ASR-BER模型在常规对抗性基准上比我们提出的G-ARA性模型显示的G-RA性模型。