Our environment is filled with rich and dynamic acoustic information. When we walk into a cathedral, the reverberations as much as appearance inform us of the sanctuary's wide open space. Similarly, as an object moves around us, we expect the sound emitted to also exhibit this movement. While recent advances in learned implicit functions have led to increasingly higher quality representations of the visual world, there have not been commensurate advances in learning spatial auditory representations. To address this gap, we introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene. By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds. We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations. We further show that the representation learned by NAFs can help improve visual learning with sparse views. Finally, we show that a representation informative of scene structure emerges during the learning of NAFs.
翻译:我们的环境充满了丰富和动态的声学信息。 当我们走进大教堂时, 反响会告诉我们圣殿宽阔的开放空间。 同样, 当一个物体在我们周围移动时, 我们期望发出的声音也能展现出这种运动。 虽然最近所学的隐含功能的进步导致视觉世界质量的日益提高, 但是在学习空间听觉演示方面并没有取得相应的进展。 为了填补这一空白, 我们引入了神经声学场( NAFs), 这是一种隐含的表达方式, 捕捉到声音如何在物理场中传播。 通过将声学传播模拟成一个直线时间变异系统, NAFs 学会了不断绘制所有发声器和收听器位置的地图, 来显示神经脉冲反应功能, 然后可以应用到任意的声音。 我们证明NAFs的持续性质使我们能够在任意的位置为听众提供空间声学声音, 并且可以预测新地点的音波传播。 我们进一步表明, NAFs 所学的表达方式可以帮助改善视觉的视觉学习。 最后, 我们显示, 在学习NAFs 结构的过程中, 显示一个显示显示显示显示显示图像结构的显示显示显示显示显示显示显示显示的图像结构结构在学习过程中出现。