Neural networks (NNs) have been widely applied in speech processing tasks, and, in particular, those employing microphone arrays. Nevertheless, most existing NN architectures can only deal with fixed and position-specific microphone arrays. In this paper, we present an NN architecture that can cope with microphone arrays whose number and positions of the microphones are unknown, and demonstrate its applicability in the speech dereverberation task. To this end, our approach harnesses recent advances in deep learning on set-structured data to design an architecture that enhances the reverberant log-spectrum. We use noisy and noiseless versions of a simulated reverberant dataset to test the proposed architecture. Our experiments on the noisy data show that the proposed scene-agnostic setup outperforms a powerful scene-aware framework, sometimes even with fewer microphones. With the noiseless dataset we show that, in most cases, our method outperforms the position-aware network as well as the state-of-the-art weighted linear prediction error (WPE) algorithm.
翻译:神经网络(NNs)被广泛应用于语音处理任务,特别是使用麦克风阵列的网络。然而,大多数现有的NN结构只能处理固定的和位置特有的麦克风阵列。在本文中,我们提出了一个NN结构结构可以应付麦克风阵列,这些麦克风的数目和位置都不为人知,并显示其在语音变形任务中的可适用性。为此,我们的方法利用了在固定结构数据方面的最近深入学习进展,设计了一个能够增强回声日志频谱的架构。我们使用模拟反动数据集的噪音和无噪音版本来测试提议的架构。我们在噪音数据上的实验显示,拟议的场景-敏感设置超越了强大的景-觉变框架,有时甚至用较少的麦克风。我们用无噪音数据集显示,在大多数情况下,我们的方法超越了定位网络以及最新加权线性预测错误。