与非高加索革命相伴的可凝固神经音频合成 (Streamable Neural Audio Synthesis With Non-Causal Convolutions)

Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.

翻译：深层学习模型大多用于离线推导方式。但是,这极大地限制了这些模型在音频生成设置中的使用,因为大多数创造性工作流程都以实时数字信号处理为基础。虽然基于经常性网络的方法可以自然地适应这种缓冲计算,但使用卷变仍然构成一些严峻的挑战。要解决这个问题,就提出了因果流变变换方法。然而,这需要具体的复杂培训,并能够影响由此产生的音频质量。在本文中,我们引入了一种新的方法,允许生成非因果流动模型。这样可以使任何动态模型与实时缓冲处理兼容。由于我们的方法基于该模式的训练后重组,我们显示,在不因果制约的情况下,能够将经过培训的模式转换成流模式。我们展示了如何调整方法,以适应与平行分支相交的复杂结构。我们将其应用到最近的 RAVE 模式,它提供高质量的实时音频合成。我们测试了多部音频和语音数据转换方法,我们测试了与实时缓冲的实时缓冲处理方法。由于我们的方法基于培训后的后期重新配置模式,我们能够将模型转换成一个数字- 并显示,我们以快速生成的外部生成,因此可以更快地复制工作。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【图神经网络导论】Intro to Graph Neural Networks，176页ppt

专知会员服务

127+阅读 · 2021年6月4日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日