Audio source separation is often achieved by estimating the magnitude spectrogram of each source, and then applying a phase recovery (or spectrogram inversion) algorithm to retrieve time-domain signals. Typically, spectrogram inversion is treated as an optimization problem involving one or several terms in order to promote estimates that comply with a consistency property, a mixing constraint, and/or a target magnitude objective. Nonetheless, it is still unclear which set of constraints and problem formulation is the most appropriate in practice. In this paper, we design a general framework for deriving spectrogram inversion algorithm, which is based on formulating optimization problems by combining these objectives either as soft penalties or hard constraints. We solve these by means of algorithms that perform alternating projections on the subsets corresponding to each objective/constraint. Our framework encompasses existing techniques from the literature as well as novel algorithms. We investigate the potential of these approaches for a speech enhancement task. In particular, one of our novel algorithms outperforms other approaches in a realistic setting where the magnitudes are estimated beforehand using a neural network.
翻译:音频源的分离往往通过估计每个源的大小光谱图来实现,然后应用一个阶段恢复(或光谱反转)算法来检索时空域信号。通常,光谱反转被视为一个或数个术语的优化问题,以促进符合一致性属性、混合制约和/或目标级目标的估算。然而,目前尚不清楚哪些制约和问题配制是实践中最适当的。在本文件中,我们设计了一个用于产生光谱反转算法的一般框架,它基于将这些目标作为软性惩罚或硬性限制结合,以提出优化问题。我们通过对与每个目标/约束相对应的子集进行交替预测的算法加以解决。我们的框架包括文献中的现有技术和新奇的算法。我们调查这些方法对于增强语音任务的潜力。特别是,我们的新算法之一在现实的环境下超越了其他方法,在这个环境中,事先使用神经网络对数量进行了估计。</s>