TRUNet:多信道动态稳妥源分离变异器-Rext-U网络 (TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation)

In recent years, many deep learning techniques for single-channel sound source separation have been proposed using recurrent, convolutional and transformer networks. When multiple microphones are available, spatial diversity between speakers and background noise in addition to spectro-temporal diversity can be exploited by using multi-channel filters for sound source separation. Aiming at end-to-end multi-channel source separation, in this paper we propose a transformer-recurrent-U network (TRUNet), which directly estimates multi-channel filters from multi-channel input spectra. TRUNet consists of a spatial processing network with an attention mechanism across microphone channels aiming at capturing the spatial diversity, and a spectro-temporal processing network aiming at capturing spectral and temporal diversities. In addition to multi-channel filters, we also consider estimating single-channel filters from multi-channel input spectra using TRUNet. We train the network on a large reverberant dataset using a combined compressed mean-squared error loss function, which further improves the sound separation performance. We evaluate the network on a realistic and challenging reverberant dataset, generated from measured room impulse responses of an actual microphone array. The experimental results on realistic reverberant sound source separation show that the proposed TRUNet outperforms state-of-the-art single-channel and multi-channel source separation methods.

翻译：近些年来,人们提议使用经常性、革命式和变压式的网络来进行单声道声音源分离的许多深层次学习技术。当有多个麦克风时,除了光谱时空多样性之外,还可以利用多声道过滤器来利用发言者之间的空间多样性和背景噪音。除了多声道过滤器外,我们还考虑利用TRUNet来估计多声道输入光谱的端到端多声道过滤器(TRUNet ) 。我们用一个混合压缩的中度误差损失源功能来直接估计多声道过滤器-经常-U网络(TRUNet ) 。TRUNet 包括一个空间处理网络,其中有一个空间处理网络,有麦克风频道的注意机制,旨在捕捉空间多样性,以及一个光谱时空处理网络。除了多声道过滤器外,我们还考虑用TRUWNet 来估计多声道输入光谱光谱的单声道过滤器(TRU) 。我们用一个压缩的中位中位源错误损失功能来对网络进行培训,这将进一步改善分解的性工作。我们评估网络在现实且具有挑战性的网络上对现实且具有挑战性的移动式的移动流流流流流流流流流流流流流流流数据显示,从流流结果显示的图像流流流流流流流流流流流数据源流结果。我们所生成的系统。我们评估了一次显示的系统。我们测量式的系统,从运行式的系统,从流数据源流数据源流数据系统,从运行式对流数据源外对流式对流数据源流式对流式对流式对流式对流式对流式对流式对流结果进行了一种现实的图像进行了一种对流式的预测。我们用。我们测了一种现实和移动式的系统,对流式对流式对流式对流式对流式对流式的预测,对流式对流式对流式对流式的预测,对流结果的预测式的预测,对流式对流式的预测,对流式对流式的预测,对流式对流式对流式对流式对流式对流式对流式对流式对流式对流式对流式对流式对流式