TERA: 变换器编码器演讲代表的自我监督学习 (TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech)

We introduce a self-supervised speech pre-training method called TERA, which stands for Transformer Encoder Representations from Alteration. Recent approaches often learn by using a single auxiliary task like contrastive prediction, autoregressive prediction, or masked reconstruction. Unlike previous methods, we use alteration along three orthogonal axes to pre-train Transformer Encoders on a large amount of unlabeled speech. The model learns through the reconstruction of acoustic frames from their altered counterpart, where we use a stochastic policy to alter along various dimensions: time, frequency, and magnitude. TERA can be used for speech representations extraction or fine-tuning with downstream models. We evaluate TERA on several downstream tasks, including phoneme classification, keyword spotting, speaker recognition, and speech recognition. We present a large-scale comparison of various self-supervised models. TERA achieves strong performance in the comparison by improving upon surface features and outperforming previous models. In our experiments, we study the effect of applying different alteration techniques, pre-training on more data, and pre-training on various features. We analyze different model sizes and find that smaller models are strong representation learners than larger models, while larger models are more effective for downstream fine-tuning than smaller models. Furthermore, we show the proposed method is transferable to downstream datasets not used in pre-training.

翻译：我们引入了一种自我监督的演讲预培训方法,即TERA,它代表了变异器编码代表器的改造。最近的方法往往通过使用一个单一辅助任务来学习,例如对比预测、自动递减预测或蒙面重建。与以往的方法不同,我们使用三个正方形轴的改变,用大量未贴标签的演讲进行预培训变异器编译。模型从变异的对应方那里学习了声学框架,我们用一种随机政策来改变不同的层面:时间、频率和规模。TERA可用于语音代表提取或与下游模型进行微调。我们评估了TERA的几项下游任务,包括电话分类、关键词识别、扬声器识别和语音识别。我们对各种自我监督的模型进行了大规模比较。TERA通过改进表面特征和优异的以往模型,在比较中取得了很强的成绩。在我们的实验中,我们研究了应用不同的改变技术、对更多数据进行预先培训以及各种特征的预培训的效果。我们分析了不同模型的下游任务,包括电话分类、关键识别器、语音识别器的大小,而我们使用的下游模型则显示,而采用较小型模型是较强的。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日