This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation. Prior methods for unsupervised separation required the synthesis of mixtures of mixtures or assumed the existence of a teacher model, making them difficult to consider as potential methods explaining the emergence of separation abilities in an animal's auditory system. We assume the availability of two-channel mixtures at training time, and train a neural network to separate the sources given one of the channels as input such that the other channel may be predicted from the separated sources. As the relationship between the room impulse responses (RIRs) of each channel depends on the locations of the sources, which are unknown to the network, the network cannot rely on learning that relationship. Instead, our proposed loss function fits each of the separated sources to the mixture in the target channel via Wiener filtering, and compares the resulting mixture to the ground-truth one. We show that minimizing the scale-invariant signal-to-distortion ratio (SI-SDR) of the predicted right-channel mixture with respect to the ground truth implicitly guides the network towards separating the left-channel sources. On a semi-supervised reverberant speech separation task based on the WHAMR! dataset, using training data where just 5% (resp., 10%) of the mixtures are labeled with associated isolated sources, we achieve 70% (resp., 78%) of the SI-SDR improvement obtained when training with supervision on the full training set, while a model trained only on the labeled data obtains 43% (resp., 45%).
翻译:本文建议以监管(RAS) 进行反动, 这是用于单声道变异语音分离的一种新颖且不受监督的损失功能。 先前未经监督的分离方法需要混合混合物的混合物, 或者假设存在教师模型, 因而难以考虑作为解释动物听觉系统中分离能力出现的潜在方法。 我们假设在培训时存在双声道混合物, 并训练神经网络将给定的源作为输入的渠道之一进行分离, 以便从分离的源中预测到另一个频道。 由于每个频道的室间脉冲反应( RIRs) 之间的关系取决于源的位置, 而这些源是网络所不知道的, 网络无法依赖这种关系。 相反, 我们提出的损失功能通过Wiener过滤将每个分离源与目标频道中的混合物相匹配, 并且将由此产生的混合物与地面图谱比较。 我们显示, 将比例变异性信号与其它频道的改进率比率( SI-SDR) 的预测右色调混合物与离子色调源的关系, 网络在使用加密的网络数据中, 将数据与加密数据转换为VEVEDRA 。 数据源, 将刚化的系统数据与基于 10 任务设置的数据进行分离。