Deep neural networks (DNNs) are very effective for multichannel speech enhancement with fixed array geometries. However, it is not trivial to use DNNs for ad-hoc arrays with unknown order and placement of microphones. We propose a novel triple-path network for ad-hoc array processing in the time domain. The key idea in the network design is to divide the overall processing into spatial processing and temporal processing and use self-attention for spatial processing. Using self-attention for spatial processing makes the network invariant to the order and the number of microphones. The temporal processing is done independently for all channels using a recently proposed dual-path attentive recurrent network. The proposed network is a multiple-input multiple-output architecture that can simultaneously enhance signals at all microphones. Experimental results demonstrate the excellent performance of the proposed approach. Further, we present analysis to demonstrate the effectiveness of the proposed network in utilizing multichannel information even from microphones at far locations.
翻译:深神经网络(DNNS)对于使用固定阵列的几何来增强多通道语音非常有效,然而,将DNNS用于对麦克风排列顺序和位置不明的特设阵列并非微不足道。我们提议在时间域内建立一个新型的用于对阵阵阵列处理的三重路径网络。网络设计的关键思想是将总体处理分为空间处理和时间处理,并使用空间处理的自我关注。利用空间处理的自我关注使网络对麦克风的顺序和数量变化不定。所有频道的时间处理都是独立进行的,使用最近提议的双向关注的经常性网络。拟议的网络是一个多重投入的多输出结构,可以同时增强所有麦克风的信号。实验结果显示了拟议方法的出色性能。此外,我们提出分析,以证明拟议的网络在利用多频道信息方面的有效性,即使来自远方的麦克风。