调查自动语音识别自上预先掌握的自动语音识别模式的共集特征 (Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition)

Self-supervised learning (SSL) based models have been shown to generate powerful representations that can be used to improve the performance of downstream speech tasks. Several state-of-the-art SSL models are available, and each of these models optimizes a different loss which gives rise to the possibility of their features being complementary. This paper proposes using an ensemble of such SSL representations and models, which exploits the complementary nature of the features extracted by the various pretrained models. We hypothesize that this results in a richer feature representation and shows results for the ASR downstream task. To this end, we use three SSL models that have shown excellent results on ASR tasks, namely HuBERT, Wav2vec2.0, and WaveLM. We explore the ensemble of models fine-tuned for the ASR task and the ensemble of features using the embeddings obtained from the pre-trained models for a downstream ASR task. We get improved performance over individual models and pre-trained features using Librispeech(100h) and WSJ dataset for the downstream tasks.

翻译：以自我监督为基础的学习模式(SSL)已被证明能够产生强大的代表性,用来改进下游演讲任务的业绩。有几个最先进的SSL模型可供使用,这些模型都优化了不同的损失,从而有可能使其特征互补。本文件建议使用这类SSL模型和模型的组合,利用各种预先培训模式所提取的特征的互补性。我们假设这会产生更丰富的特征代表,并显示ASR下游任务的结果。为此,我们使用三个在ASR任务上取得了出色成果的SSL模型,即HuBERT、Wav2vec2.0和WaveLM。我们探索了对ASR任务进行微调的模型的组合,以及使用从预先培训模式中获得的嵌入式组合,用于下游的ASR任务。我们利用Librispeech(100h)和WSJ数据集,改进了个人模型和预先培训前特征的性能。我们利用Librispeech(100h)和下游任务上的个人模型和WSJ数据集。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/