ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort. Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training. We also explore their combination for further improvement, achieving the same performance as i-vectors plus adversarial training. Our best speaker-based MTL achieves 7\% relative improvement on the Switchboard Hub5'00 set. We also investigate the effect of such speaker-based MTL w.r.t. cleaner dataset and weaker ASR NN.
翻译:通过多任务学习(MTL)可以改进ASR,这些多任务学习包括领域强化或领域对抗性培训,这是两个对立的目标,分别旨在增加/减少对域认知/不可知的ASR的域差异,在这项工作中,我们研究如何以最佳方式应用这两个对立目标,并贴上扬声器标签,改进以相容者为基础的ASR。我们还提出一个新的适应性梯度逆转层,用于稳定和有效的对抗性培训,而不作出调控努力。进行详细分析和实验核查,以显示ASR神经网络(NN)的最佳位置,以应用扩音和对抗性培训。我们还探索这些组合,以便进一步改进,实现i-vetors+对抗性培训的相同性能。我们最好的讲演者MTL在开关5'00设置上实现了7 ⁇ 相对改进。我们还调查了这种以发声器为基础的MTL w.r.t.更清洁的数据集和较弱的ASR NNN的效应。