Sound field decomposition predicts waveforms in arbitrary directions using signals from a limited number of microphones as inputs. Sound field decomposition is fundamental to downstream tasks, including source localization, source separation, and spatial audio reproduction. Conventional sound field decomposition methods such as Ambisonics have limited spatial decomposition resolution. This paper proposes a learning-based Neural Sound field Decomposition (NeSD) framework to allow sound field decomposition with fine spatial direction resolution, using recordings from microphone capsules of a few microphones at arbitrary positions. The inputs of a NeSD system include microphone signals, microphone positions, and queried directions. The outputs of a NeSD include the waveform and the presence probability of a queried position. We model the NeSD systems respectively with different neural networks, including fully connected, time delay, and recurrent neural networks. We show that the NeSD systems outperform conventional Ambisonics and DOANet methods in sound field decomposition and source localization on speech, music, and sound events datasets. Demos are available at https://www.youtube.com/watch?v=0GIr6doj3BQ.
翻译:音频场分解法使用数量有限的麦克风作为投入的信号,以任意方向预测波形; 声场分解是下游任务的基础,包括源地定位、源地分离和空间音频复制; 常规声场分解方法,如Ambisionicas,空间分解分辨率有限。 本文提议一个基于学习的神经音响场分解框架,使声场分解与良好的空间方向分辨率相适应,在任意位置上使用微小麦克风的麦克风胶囊录音进行声场分解。 NESD 系统的投入包括麦克风信号、麦克风位置和查询方向。 NESD 的输出包括波形和被查询位置的存在概率。 我们用不同的神经网络分别模拟 NESD 系统,包括完全连接、延迟和经常性神经网络。 我们显示, NESD 系统在声场分解和语音、音乐和声音事件数据集的源地定位方面超越常规的ADANet方法。 Demos 可在 https://www.youtube.com/watch?=0GIr6do3。