While deep neural networks greatly facilitate the proliferation of the speech enhancement field, most of the existing methods are developed following either heuristic or blind optimization criteria, which severely hampers interpretability and transparency. Inspired by Taylor's theorem, we propose a general unfolding framework for both single- and multi-channel speech enhancement tasks. Concretely, we formulate the complex spectrum recovery into the spectral magnitude mapping in the neighboring space of the noisy mixture, in which the sparse prior is introduced for phase modification in advance. Based on that, the mapping function is decomposed into the superimposition of the 0th-order and high-order polynomials in Taylor's series, where the former coarsely removes the interference in the magnitude domain and the latter progressively complements the remaining spectral detail in the complex spectrum domain. In addition, we study the relation between adjacent order term and reveal that each high-order term can be recursively estimated with its lower-order term, and each high-order term is then proposed to evaluate using a surrogate function with trainable weights, so that the whole system can be trained in an end-to-end manner. Extensive experiments are conducted on WSJ0-SI84, DNS-Challenge, Voicebank+Demand, and spatialized Librispeech datasets. Quantitative results show that the proposed approach not only yields competitive performance over existing top-performed approaches, but also enjoys decent internal transparency and flexibility.
翻译:虽然深心神经网络极大地促进了语音增强领域的扩散,但大多数现有方法都是按照超常或盲目的优化标准开发的,这严重妨碍了解释和透明度。在泰勒理论的启发下,我们为单一和多频道语音增强任务提出了一个总体展开的框架。具体地说,我们将复杂的频谱恢复纳入噪音混合物附近空间的光量绘图中,在这个空间中,先稀释的先稀释的先行可提前进行阶段修改。在此基础上,绘图功能被分解成泰勒系列中0级和高端多级的超音速功能的叠加,前者粗略地消除了在音量领域的干扰,而后者逐渐补充了复杂频谱领域的剩余光谱细节。此外,我们研究相邻的订单术语之间的关系,并表明每个高端术语都可以通过较低的术语进行回溯性估计,然后提议每个高端术语使用可受训练的重量的隐形功能进行评估,这样整个系统就可以以高端域和高端的语音-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-展示-展示-展示-展示-展示-展示-展示-展示-图像-展示-展示-展示-展示-展示-展示-图像-结果-展示-展示-展示-展示-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-图像-