In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10{\deg} even in scenarios with a reverberation time $T_{60}$ of 1.5 s.
翻译:在本文中,我们展示了一种基于从麦克风阵列收到的信号中计算出的 SRP-PHAT 电图上完全连接的层,根据Icosahedal Convolutional Neal网络(CNN),对音频源进行的新估算模型。这个光学成像线对像仪对60个旋转的对称性,它代表着球旋空间连续旋转的良好近似值,并且可以使用标准 2D 相向层,其计算成本比大多数球形CNN低。此外,我们建议采用一个新的软方形对称功能,可以被视为矩形函数的可变版本,并使我们能够解决DOA 估算,将其作为一个回归问题,用来解释交错层的输出值的概率分布。我们证明,使用符合问题不均匀的模型,使我们在共振动性CNN中超越了完全连接的层。此外,我们没有使用完全连接的层层,而是提出一个新的软方形对称功能功能,可以被视为引力函数功能的60的可不同版本,并让我们在更稳健的模型中以更低的模型重新计算。