Despite the promising performance of existing frame-wise all-neural beamformers in the speech enhancement field, it remains unclear what the underlying mechanism exists. In this paper, we revisit the beamforming behavior from the beam-space dictionary perspective and formulate it into the learning and mixing of different beam-space components. Based on that, we propose an all-neural beamformer called TaylorBM to simulate Taylor's series expansion operation in which the 0th-order term serves as a spatial filter to conduct the beam mixing, and several high-order terms are tasked with residual noise cancellation for post-processing. The whole system is devised to work in an end-to-end manner. Experiments are conducted on the spatialized LibriSpeech corpus and results show that the proposed approach outperforms existing advanced baselines in terms of evaluation metrics.
翻译:尽管在语音增强领域现有框架智慧的全神经光谱仪表现良好,但仍不清楚存在什么基本机制。在本文件中,我们从波束-空间字典的角度重新审视波束形态行为,并将其纳入不同波束-空间组成部分的学习和混合。在此基础上,我们提议采用称为泰勒光谱仪的全神经光谱仪模拟泰勒的系列扩展操作,其中第0级术语作为空间过滤器进行波束混合,若干高阶术语负责后处理的剩余噪声取消。整个系统的设计是要以端到端的方式工作。对空间化的LibriSpeech资料库进行了实验,结果显示,拟议的方法在评价指标方面超过了现有的先进基线。