Audio denoising has been explored for decades using both traditional and deep learning-based methods. However, these methods are still limited to either manually added artificial noise or lower denoised audio quality. To overcome these challenges, we collect a large-scale natural noise bird sound dataset. We are the first to transfer the audio denoising problem into an image segmentation problem and propose a deep visual audio denoising (DVAD) model. With a total of 14,120 audio images, we develop an audio ImageMask tool and propose to use a few-shot generalization strategy to label these images. Extensive experimental results demonstrate that the proposed model achieves state-of-the-art performance. We also show that our method can be easily generalized to speech denoising, audio separation, audio enhancement, and noise estimation.
翻译:数十年来,我们一直使用传统和深层次的学习方法探索音频去音,然而,这些方法仍然局限于人工添加人工噪声或降低音频分解质量。为了克服这些挑战,我们收集了大规模自然噪音鸟声数据集。我们首先将音频去音问题转换成图像分解问题,并提出一个深层次的音频分解模式。我们开发了一个音频图像工具,共14 120个,并提议使用一个微小的概括化战略来标注这些图像。广泛的实验结果表明,拟议的模型达到了最新性能。我们还表明,我们的方法很容易被推广到语言分解、音频分解、音频增强和噪音估计上。