ASR的统一议长适应办法 (A Unified Speaker Adaptation Approach for ASR)

Transformer models have been used in automatic speech recognition (ASR) successfully and yields state-of-the-art results. However, its performance is still affected by speaker mismatch between training and test data. Further finetuning a trained model with target speaker data is the most natural approach for adaptation, but it takes a lot of compute and may cause catastrophic forgetting to the existing speakers. In this work, we propose a unified speaker adaptation approach consisting of feature adaptation and model adaptation. For feature adaptation, we employ a speaker-aware persistent memory model which generalizes better to unseen test speakers by making use of speaker i-vectors to form a persistent memory. For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR. Specifically, we gradually prune less contributing parameters on model encoder to a certain sparsity level, and use the pruned parameters for adaptation, while freezing the unpruned parameters to keep the original model performance. We conduct experiments on the Librispeech dataset. Our proposed approach brings relative 2.74-6.52% word error rate (WER) reduction on general speaker adaptation. On target speaker adaptation, our method outperforms the baseline with up to 20.58% relative WER reduction, and surpasses the finetuning method by up to relative 2.54%. Besides, with extremely low-resource adaptation data (e.g., 1 utterance), our method could improve the WER by relative 6.53% with only a few epochs of training.

翻译：在自动语音识别(ASR)中成功地使用了变换模型,并产生了最先进的结果。然而,其性能仍然受到演讲者在培训和测试数据之间不匹配的影响。进一步微调一个经过培训的模型,使用目标演讲者数据是最自然的适应方法,但它需要大量计算,并可能造成现有演讲者灾难性的忘记。在这项工作中,我们建议采用统一的演讲者适应方法,包括特征适应和模型适应。关于特征适应,我们使用一个有声员意识的持久记忆模型,该模型通过使用扬声器i-矢量来形成持久的记忆,从而更好地概括给看不见的测试演讲者。对于模型的适应,我们使用一种新型渐进的逐步调整方法,在不改变模型结构的情况下,对目标演讲者进行调整,而不改变我们最了解的模型结构。具体地说,我们逐渐将模型编码的参数降低到一定的宽度水平,并使用纯度参数来保持原模型的性能性能。我们用Lirispeech数据的精确度进行实验,对于模型的精确性数据设置,对于模型的相对性调整,我们拟议的方法是相对的2.74-52%的比值调整方法,我们关于降低标值的比值方法。我们的拟议方法,用比值的比值方法,用比值的比值方法,用比值减少比值方法,用比值方法,比值方法,比值的比值的比值方法,比值方法,比值方法,比值方法,比值比值比值为减少。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/