The crux of single-channel speech separation is how to encode the mixture of signals into such a latent embedding space that the signals from different speakers can be precisely separated. Existing methods for speech separation either transform the speech signals into frequency domain to perform separation or seek to learn a separable embedding space by constructing a latent domain based on convolutional filters. While the latter type of methods learning an embedding space achieves substantial improvement for speech separation, we argue that the embedding space defined by only one latent domain does not suffice to provide a thoroughly separable encoding space for speech separation. In this paper, we propose the Stepwise-Refining Speech Separation Network (SRSSN), which follows a coarse-to-fine separation framework. It first learns a 1-order latent domain to define an encoding space and thereby performs a rough separation in the coarse phase. Then the proposed SRSSN learns a new latent domain along each basis function of the existing latent domain to obtain a high-order latent domain in the refining phase, which enables our model to perform a refining separation to achieve a more precise speech separation. We demonstrate the effectiveness of our SRSSN by conducting extensive experiments, including speech separation in a clean (noise-free) setting on WSJ0-2/3mix datasets as well as in noisy/reverberant settings on WHAM!/WHAMR! datasets. Furthermore, we also perform experiments of speech recognition on separated speech signals by our model to evaluate the performance of speech separation indirectly.
翻译:单通道语音分离的柱石是如何将信号混在一起编码成一个潜伏的嵌入空间,使不同发言者的信号可以精确分离。现有的语音分离方法要么将语音信号转换成频率域,以进行分离,要么寻求通过建筑一个基于连锁过滤器的潜在域学习一个可分离的嵌入空间。后一类方法学习嵌入空间可以大大改进语音分离,而我们争辩说,仅由一个潜入域定义的嵌入空间不足以为语音分离提供一个彻底分离的可分离编码空间。在本文中,我们建议采用继粗略至直线分离框架之后的“逐步修复语音分离网络 ” (SRSSN ) 。首先学习一个一阶潜域,以定义一个编码空间,从而在暗入阶段进行粗略的分离。然后,拟议的SRSSNSN在现有的潜入域的每个基础功能上学习一个新的潜入域域,以获得一个高分流的潜入域域,从而使得我们的模型能够进行精细的分隔,从而实现更精确的语音分离。我们用SNSN/M的语音分离的分解过程,我们通过进行一个清洁的磁的实验来进行一个标准化的磁性的磁性的磁性数据,我们进行一个标准化数据分解,我们将SRSSISSIS-SAR的实验。