While self-supervised speech representation learning (SSL) models serve a variety of downstream tasks, these models have been observed to overfit to the domain from which the unlabelled data originates. To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data. Intuitively, this helps to make space for the target-domain ASR finetuning. The redundant weights can be identified through various pruning strategies which have been discussed in detail as a part of this work. Specifically, we investigate the effect of the recently discovered Task-Agnostic and Task-Aware pruning on PADA and propose a new pruning paradigm based on the latter, which we call Cross-Domain Task-Aware Pruning (CD-TAW). CD-TAW obtains the initial pruning mask from a well fine-tuned OOD model, which makes it starkly different from the rest of the pruning strategies discussed in the paper. Our proposed CD-TAW methodology achieves up to 20.6% relative WER improvement over our baseline when fine-tuned on a 2-hour subset of Switchboard data without language model (LM) decoding. Furthermore, we conduct a detailed analysis to highlight the key design choices of our proposed method.
翻译:虽然自我监督的语音代表学习模式为一系列下游任务服务,但观察到这些模式过于适合未贴标签数据的来源领域。为了缓解这一问题,我们建议采用PADA(运行辅助域域适应)和零排出在大量外域数据上预先培训的模型的冗余权重。我们称之为Cross-Domain-Awarp Prurning(CD-TAW),直观地说,这有助于为目标域缩微调提供空间。多余的权重可以通过各种调整战略确定,这些战略在这项工作中已经详细讨论过。具体地说,我们调查了最近发现的“任务-智能”和“任务-软件”运行在PADADA上运行的效果,并基于后一种模式提出了一个新的裁剪裁模式,我们称之为“跨部任务-软件(OOD-OD)”预设(CD-TAW)数据。CD-TAW从一个精细调整的 OODW模型中获得初始的裁剪裁面面面罩。这使得它与在精细文件中讨论的裁剪裁战略的其余部分有明显不同之处。我们提议的CD-TATOW在2小时的模型中提议的“我们的拟议基式”方法上实现了2-S-S-S-S-Serreaxx