u-HuBERT:统一混合式混合式模式演讲预培训和零热传输到无标签模式 (u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality)

While audio-visual speech models can yield superior performance and robustness compared to audio-only models, their development and adoption are hindered by the lack of labeled and unlabeled audio-visual data and the cost to deploy one model per modality. In this paper, we present u-HuBERT, a self-supervised pre-training framework that can leverage both multimodal and unimodal speech with a unified masked cluster prediction objective. By utilizing modality dropout during pre-training, we demonstrate that a single fine-tuned model can achieve performance on par or better than the state-of-the-art modality-specific models. Moreover, our model fine-tuned only on audio can perform well with audio-visual and visual speech input, achieving zero-shot modality generalization for multiple speech processing tasks. In particular, our single model yields 1.2%/1.4%/27.2% speech recognition word error rate on LRS3 with audio-visual/audio/visual input. Codes and models are available at https://github.com/facebookresearch/av_hubert

翻译：虽然视听演讲模式与只听音模式相比能够产生优异的性能和稳健性,但由于缺乏贴标签和未贴标签的视听数据以及每个模式部署一个模式的成本,视听演讲模式的发展和采用受到阻碍。在本文件中,我们介绍了u-HuBERT,这是一个自我监督的训练前框架,可以利用多式和单式演讲,并有一个统一的蒙面集束预测目标。我们通过在培训前使用模式辍学,表明一个经过微调的单一模式能够达到比最先进的特定模式更好的或更好的性能。此外,我们仅对音频和视觉演讲投入进行微调的模型能够很好地表现,实现多个语音处理任务的零速率模式化。特别是,我们的单一模式产生1.2%/1.4%/27.2%的LRS3语言识别错误率,配有视听/视听/视觉投入。代码和模型可在https://gitub.com/faceboursearch/av_hubert上查阅。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日