End-to-end automatic speech recognition (ASR) usually suffers from performance degradation when applied to a new domain due to domain shift. Unsupervised domain adaptation (UDA) aims to improve the performance on the unlabeled target domain by transferring knowledge from the source to the target domain. To improve transferability, existing UDA approaches mainly focus on matching the distributions of the source and target domains globally and/or locally, while ignoring the model discriminability. In this paper, we propose a novel UDA approach for ASR via inter-domain MAtching and intra-domain DIscrimination (MADI), which improves the model transferability by fine-grained inter-domain matching and discriminability by intra-domain contrastive discrimination simultaneously. Evaluations on the Libri-Adapt dataset demonstrate the effectiveness of our approach. MADI reduces the relative word error rate (WER) on cross-device and cross-environment ASR by 17.7% and 22.8%, respectively.
翻译:终端到终端自动语音识别(ASR)在应用到新域时通常会因域变换而出现性能退化。无监管域适应(UDA)旨在通过将知识从源向目标域转移,提高未标目标域的性能。为了提高可转让性,现有的UDA方法主要侧重于匹配源和目标域在全球和/或本地的分布,而忽视模式差异性。在本文件中,我们提议通过跨界Meatching和内部Discriction(MADI)为ASR(UDA)提出一个新的UDA方法,通过细微区分的跨界域间匹配和内部对比歧视的可转让性,提高模型的可转让性。对Libri-Adapt数据集的评价表明我们的方法的有效性。MDIADI将跨界和跨环境ASR的相对字差率分别降低17.7%和22.8%。