In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the network, with the goal of reconstructing a full-bandwidth output. Our main contribution centers on the impact of the choice of low pass filter when training and subsequently testing the network. For two different state of the art deep architectures, ResNet and U-Net, we demonstrate that when the training and testing filters are matched, improvements in signal-to-noise ratio (SNR) of up to 7dB can be obtained. However, when these filters differ, the improvement falls considerably and under some training conditions results in a lower SNR than the band-limited input. To circumvent this apparent overfitting to filter shape, we propose a data augmentation strategy which utilizes multiple low pass filters during training and leads to improved generalization to unseen filtering conditions at test time.
翻译:在本文中,我们讨论的是音频增强这一广泛领域的子专题,即音乐声带宽度扩展。我们利用深神经网络提出带宽扩展问题,向网络输入带宽信号,目的是重建全带宽输出。我们的主要贡献中心是培训及随后测试网络时选择低传过滤器的影响。对于两个不同的先进深层次结构,即ResNet和U-Net,我们证明,当培训和测试过滤器匹配时,最多为7dB的信号-音频比(SNR)可以实现改进。然而,当这些过滤器不同时,改进幅度很大,在某些培训条件下,SNR比带宽输入的结果要低。为了绕过这一明显超出过滤形状的情况,我们提议了一个数据增强战略,在培训期间利用多个低传通过滤器,并导致测试时改进对隐蔽过滤条件的普及。