Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals. Since sampling frequency, one of the audio signal varieties, is usually application specific, the preceding audio source separation model should be able to deal with audio signals of all sampling frequencies specified in the target applications. However, conventional models based on deep neural networks (DNNs) are trained only at the sampling frequency specified by the training data, and there are no guarantees that they work with unseen sampling frequencies. In this paper, we propose a convolution layer capable of handling arbitrary sampling frequencies by a single DNN. Through music source separation experiments, we show that the introduction of the proposed layer enables a conventional audio source separation model to consistently work with even unseen sampling frequencies.
翻译:音频源分离常常被用作各种应用的预处理,其最终目标之一是建立一个能够处理各种音频信号的单一多功能模型;由于取样频率(音频信号品种之一)通常是具体应用的,前一个音频源分离模型应能处理目标应用中具体列出的所有取样频率的音频信号;然而,基于深神经网络(DNN)的常规模型只按照培训数据规定的取样频率进行培训,而且不能保证它们使用看不见的取样频率;在本文中,我们提议建立一个能够由一个DNN处理任意采样频率的演动层。我们通过音乐源分离实验,显示采用拟议的层使传统的音源分离模型能够与即使是看不见的采样频率保持一致。