The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since the target domain usually lacks labeled data, and domain shifts exist at acoustic and linguistic levels, it is challenging to perform unsupervised domain adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by exploiting the self-supervisions of unlabeled data. However, these self-supervisions also face performance degradation in mismatched domain distributions, which previous work fails to address. This work presents a systematic UDA framework to fully utilize the unlabeled data with self-supervision in the pre-training and fine-tuning paradigm. On the one hand, we apply continued pre-training and data replay techniques to mitigate the domain mismatch of the SSL pre-trained model. On the other hand, we propose a domain-adaptive fine-tuning approach based on the PL technique with three unique modifications: Firstly, we design a dual-branch PL method to decrease the sensitivity to the erroneous pseudo-labels; Secondly, we devise an uncertainty-aware confidence filtering strategy to improve pseudo-label correctness; Thirdly, we introduce a two-step PL approach to incorporate target domain linguistic knowledge, thus generating more accurate target domain pseudo-labels. Experimental results on various cross-domain scenarios demonstrate that the proposed approach could effectively boost the cross-domain performance and significantly outperform previous approaches.
翻译:自动语音识别(ASR)的跨面性能可能因培训和测试分布不匹配而严重受阻。由于目标领域通常缺乏标签数据,且在声学和语言层面存在域变换,因此对ASR进行不受监督的域适应(UDA)具有挑战性。先前的工作表明,自我监督的学习(SSL)或假标签(PL)在UDA中是有效的。另一方面,我们建议一种基于无标签数据自我监督的域分配(PL)的域调整(SSS-监督)方法。然而,这些自我监督的跨面性能也面临不匹配域分布的性能退化,而先前的工作未能解决这个问题。这项工作提出了一个系统化的UDA框架,以充分利用在培训前和微调范范范中带有自我监督功能的无标签数据(UDA)。一方面,我们继续采用预先培训和数据重播技术,以缓解SSL预先培训模式的域错乱。另一方面,我们建议一种基于PL技术的域适应性微调整方法。首先,我们设计一种双端平级平级平级平级平级平级平级平级平面方法,以降低对域战略的准确度,从而引入错误的域变校准。