The spatial covariance matrix has been considered to be significant for beamformers. Standing upon the intersection of traditional beamformers and deep neural networks, we propose a causal neural beamformer paradigm called Embedding and Beamforming, and two core modules are designed accordingly, namely EM and BM. For EM, instead of estimating spatial covariance matrix explicitly, the 3-D embedding tensor is learned with the network, where both spectral and spatial discriminative information can be represented. For BM, a network is directly leveraged to derive the beamforming weights so as to implement filter-and-sum operation. To further improve the speech quality, a post-processing module is introduced to further suppress the residual noise. Based on the DNS-Challenge dataset, we conduct the experiments for multichannel speech enhancement and the results show that the proposed system outperforms previous advanced baselines by a large margin in multiple evaluation metrics.
翻译:空间共变矩阵被认为对光源体十分重要。 在传统光源体和深神经网络交汇处,我们提议采用一个称为嵌入和波形的因果神经光谱模型,并据此设计了两个核心模块,即EM和BM。对于EM来说,不是明确估计空间共变矩阵,而是与网络学习三维嵌入色子,在网络中可以代表光谱和空间歧视信息。对于BM来说,直接利用网络来提取波形重量,以便实施过滤和组合操作。为了进一步提高语音质量,采用了后处理模块以进一步抑制残余噪音。基于DNS-Challenge数据集,我们进行了多频道语音增强实验,结果显示,拟议的系统在多个评价指标中大大超越了先前的高级基线。