Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost of losing theoretical guarantees in noise-free conditions. We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network that preserves the timing information contained in the signals. By extensive experiments we show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery in ideal conditions.
翻译:使用麦克风阵列的扩音器本地化取决于准确的时间延迟估计技术。几十年来,基于与阶段变换(GCC-PHAT)的普遍交叉关联性的方法已为此广泛采用。最近,海合会-PHAT还被用来向神经网络提供输入功能,以消除噪音和反响的影响,但代价是在无噪音条件下丧失理论保障。我们提议采用新颖的办法扩大海合会-PHAT,通过移动等式神经网络过滤收到的信号,以保存信号中包含的时间信息。通过广泛的实验,我们证明我们的模型一贯减少海合会-PHAT在不利环境中的错误,并保证在理想条件下准确拖延恢复。