DBT-Net:两部门联合规模和阶段估计,配有注意注意变压器,用于加强月经语音 (DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement)

The decoupling-style concept begins to ignite in the speech enhancement area, which decouples the original complex spectrum estimation task into multiple easier sub-tasks i.e., magnitude-only recovery and the residual complex spectrum estimation)}, resulting in better performance and easier interpretability. In this paper, we propose a dual-branch federative magnitude and phase estimation framework, dubbed DBT-Net, for monaural speech enhancement, aiming at recovering the coarse- and fine-grained regions of the overall spectrum in parallel. From the complementary perspective, the magnitude estimation branch is designed to filter out dominant noise components in the magnitude domain, while the complex spectrum purification branch is elaborately designed to inpaint the missing spectral details and implicitly estimate the phase information in the complex-valued spectral domain. To facilitate the information flow between each branch, interaction modules are introduced to leverage features learned from one branch, so as to suppress the undesired parts and recover the missing components of the other branch. Instead of adopting the conventional RNNs and temporal convolutional networks for sequence modeling, we employ a novel attention-in-attention transformer-based network within each branch for better feature learning. More specially, it is composed of several adaptive spectro-temporal attention transformer-based modules and an adaptive hierarchical attention module, aiming to capture long-term time-frequency dependencies and further aggregate intermediate hierarchical contextual information. Comprehensive evaluations on the WSJ0-SI84 + DNS-Challenge and VoiceBank + DEMAND dataset demonstrate that the proposed approach consistently outperforms previous advanced systems and yields state-of-the-art performance in terms of speech quality and intelligibility.

翻译：调离式风格概念开始在增强语调的领域点燃,在增强语调的领域将最初复杂的频谱估算任务分解为多个更轻松的子任务,即:仅星级恢复和剩余复杂频谱估计)},导致更好的性能和更容易解读。在本文件中,我们提议了一个双分级的分级级规模和阶段估算框架,称为DBT-Net,用于提高声调,目的是同时恢复整个频谱中粗糙和细度的区域。从补充角度看,规模估算部门旨在将最初复杂的频谱估计任务分解为多个更简单的次任务,即:在数量域中,将最初复杂的频谱净化部门设计为插入缺失的频谱细节,并暗含地估计复杂光谱域的阶段信息。为了便利每个分支之间的信息流动,引入互动模块,以利用从一个分支学到的特征,以压制不受欢迎的部分,并恢复其他分支的缺失部分。从常规 RNNS 和时相回移网络中,用于在数量级内部范围域域域域域域域域域域域域域域域域域域域内,我们采用新的注意力变变换前的系统。