Artificial reverberation (AR) models play a central role in various audio applications. Therefore, estimating the AR model parameters (ARPs) of a reference reverberation is a crucial task. Although a few recent deep-learning-based approaches have shown promising performance, their non-end-to-end training scheme prevents them from fully exploiting the potential of deep neural networks. This motivates the introduction of differentiable artificial reverberation (DAR) models, allowing loss gradients to be back-propagated end-to-end. However, implementing the AR models with their difference equations "as is" in the deep learning framework severely bottlenecks the training speed when executed with a parallel processor like GPU due to their infinite impulse response (IIR) components. We tackle this problem by replacing the IIR filters with finite impulse response (FIR) approximations with the frequency-sampling method. Using this technique, we implement three DAR models -- differentiable Filtered Velvet Noise (FVN), Advanced Filtered Velvet Noise (AFVN), and Delay Network (DN). For each AR model, we train its ARP estimation networks for analysis-synthesis (RIR-to-ARP) and blind estimation (reverberant-speech-to-ARP) task in an end-to-end manner with its DAR model counterpart. Experiment results show that the proposed method achieves consistent performance improvement over the non-end-to-end approaches in both objective metrics and subjective listening test results.
翻译:人工回校模型在各种音频应用中发挥着中心作用。 因此, 估算参考回校模型的AR模型参数( ARPs) 是一项关键任务。 尽管最近一些基于深层次学习的方法表现出了有希望的性能, 但它们的非端对端培训计划阻止了它们充分利用深层神经网络的潜力。 这促使采用不同的人工回校模型( DAR), 允许将损失梯度对端反向端。 然而, 在深层次学习框架中, 使用其差异方程式“ 正在” 来实施AR模型, 严重阻碍与GPU等平行处理器执行的培训速度, 原因是其无限的脉冲反应( IIR) 部分。 我们解决这个问题的方法是用有限的脉冲反应( FIR) 近距离来取代IIR 过滤器。 我们采用三种DAR模型 -- -- 不同过滤的螺旋断断尾端对端对端对端方法( FVVVN), 高级过滤的NVN( AFVN), 和延迟的对等改进方法网络( DNR- RVR), 用于每个模型的升级的对端对端测试, 显示其最终的AR- R- R 和对端任务分析。