We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural audio synthesis which operates directly in the waveform domain, with an accompanying optimisation (FastNEWT) for efficient CPU inference. The NEWT uses time-distributed multilayer perceptrons with periodic activations to implicitly learn nonlinear transfer functions that encode the characteristics of a target timbre. Once trained, a NEWT can produce complex timbral evolutions by simple affine transformations of its input and output signals. We paired the NEWT with a differentiable noise synthesiser and reverb and found it capable of generating realistic musical instrument performances with only 260k total model parameters, conditioned on F0 and loudness features. We compared our method to state-of-the-art benchmarks with a multi-stimulus listening test and the Fr\'echet Audio Distance and found it performed competitively across the tested timbral domains. Our method significantly outperformed the benchmarks in terms of generation speed, and achieved real-time performance on a consumer CPU, both with and without FastNEWT, suggesting it is a viable basis for future creative sound design tools.
翻译:我们推出神经波成形单元(NEWT):一种在波形域直接运行的新颖的、轻轻的、完全因果关系的神经声合成方法,配对的神经声合成,配对的是一种可调音合成器(FastNEWT),对高效的CPU进行推断。NET使用时间分布式多层透视器,并定期启动,以隐含地学习非线性转移功能,将目标色调的特点编码。经过培训后,NEWT可以通过其输入和输出信号的简单瞬间转换产生复杂的音宽进化。我们把NEWT配对成一个不同的噪声合成器和回动器,发现它能够产生现实的乐器性性表演,只有260公里的总模型参数(F0和响度特性)。我们将我们的方法与状态基准进行比较,以多振动听觉测试和Fr\'echet音频距离为特征,发现它在整个测试的台域里具有竞争力。我们的方法大大超出其生成速度的基准,并在消费者CPU和FNW没有未来设计上实现。