Frequency-domain neural beamformers are the mainstream methods for recent multi-channel speech separation models. Despite their well-defined behaviors and the effectiveness, such frequency-domain beamformers still have the limitations of a bounded oracle performance and the difficulties of designing proper networks for the complex-valued operations. In this paper, we propose a time-domain generalized Wiener filter (TD-GWF), an extension to the conventional frequency-domain beamformers that has higher oracle performance and only involves real-valued operations. We also provide discussions on how TD-GWF can be connected to conventional frequency-domain beamformers. Experiment results show that a significant performance improvement can be achieved by replacing frequency-domain beamformers by the TD-GWF in the recently proposed sequential neural beamforming pipelines.
翻译:频域内神经光谱仪是最近多频道语音分离模型的主流方法。尽管这些光谱仪的行为和效果都十分明确,但这种频域内光束仪仍然具有约束性或触角性,而且难以为复杂价值业务设计适当的网络。在本文中,我们提议采用时域通用维纳过滤器(TD-GWF),扩大常规频域内光谱仪,该光谱仪的性能更高,只涉及实际价值操作。我们还提供了如何将TD-GWF与常规频域光谱仪连接起来的讨论。实验结果显示,通过由TD-GWF替换最近提议的相继神经成形管道中的频域光谱光谱仪,可以实现显著的性能改进。