转折:多面扩散的折叠式演讲 (TransFusion: Transcribing Speech with Multinomial Diffusion)

Diffusion models have shown exceptional scaling properties in the image synthesis domain, and initial attempts have shown similar benefits for applying diffusion to unconditional text synthesis. Denoising diffusion models attempt to iteratively refine a sampled noise signal until it resembles a coherent signal (such as an image or written sentence). In this work we aim to see whether the benefits of diffusion models can also be realized for speech recognition. To this end, we propose a new way to perform speech recognition using a diffusion model conditioned on pretrained speech features. Specifically, we propose TransFusion: a transcribing diffusion model which iteratively denoises a random character sequence into coherent text corresponding to the transcript of a conditioning utterance. We demonstrate comparable performance to existing high-performing contrastive models on the LibriSpeech speech recognition benchmark. To the best of our knowledge, we are the first to apply denoising diffusion to speech recognition. We also propose new techniques for effectively sampling and decoding multinomial diffusion models. These are required because traditional methods of sampling from acoustic models are not possible with our new discrete diffusion approach. Code and trained models are available: https://github.com/RF5/transfusion-asr

翻译：在图像合成领域,传播模型显示了不同寻常的缩放特性,最初的尝试也显示了将传播应用到无条件文本合成的类似好处。拒绝扩散模型试图迭接地完善抽样噪音信号,直到它类似于一个一致信号(如图像或书面句子)。在这项工作中,我们的目标是看传播模型的好处能否也实现语音识别。为此,我们提出一种新的方法,使用以预先训练的语音特征为条件的传播模型来进行语音识别。具体地说,我们提议TransFusion:一种将随机字符序列叠加成与调试词记录相对应的一致文本的转录式扩散模型。我们展示了与LibriSpeech语音识别基准上现有高性对比模型的相似性。我们最了解的是,我们是首先将传播方法分解到语音识别中。我们还提议了一种新技术,以有效采样和分解多种语言传播模型。我们之所以需要这些新技术,是因为通过新的离散传播方法无法从声学模型中提取传统的采样方法。代码和受过训练的模型: http://githrustrivis5/travestrymblyasismusismismismmmission smission am am am am is is be supolable supol

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日