We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create a high-quality sound experience in interactive applications and audio processing. Our method can handle input triangular meshes with arbitrary topologies (2K - 3M triangles). We present a novel training technique to train MESH2IR using energy decay relief and highlight its benefits. We also show that training MESH2IR on IRs preprocessed using our proposed technique significantly improves the accuracy of IR generation. We reduce the non-linearity in the mesh space by transforming 3D scene meshes to latent space using a graph convolution network. Our MESH2IR is more than 200 times faster than a geometric acoustic algorithm on a CPU and can generate more than 10,000 IRs per second on an NVIDIA GeForce RTX 2080 Ti GPU for a given furnished indoor 3D scene. The acoustic metrics are used to characterize the acoustic environment. We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error. We also highlight the benefits of MESH2IR on audio and speech processing applications such as speech dereverberation and speech separation. To the best of our knowledge, ours is the first neural-network-based approach to predict IRs from a given 3D scene mesh in real-time.
翻译:我们提出一个基于网格的神经网络(MESH2IR),为使用网状图像显示的室内3D场景生成声动反应(IRs)。 IRS用于在互动应用程序和音频处理中创造高质量的声学经验。我们的方法可以处理含有任意地形的三角模类输入(2K-3M三角形),我们展示了一种新颖的培训技术,用能量衰减来培训MESH2IR,并突出其益处。我们还展示了使用我们提议的技术对使用室内3D版图像预处理的IRS预处理的MESH2IR培训,大大提高了IR一代的准确性。我们用图解析网络将3D场色片转化为潜藏空间,从而减少了网状空间的非线性。我们的MES2IR方法可以比CPU的几何声学测算法速度快200倍以上。我们用NVVIDIA Geforace RTX 2080 TiGPU, 用于提供室内3D场的语音演示。我们所使用的声学测量测量测量环境时使用了非声学计量的图像矩阵,我们MS 3IR 的判判判比我们图像的图像的图像学比我们用来预测。我们从地面上的判判判判判判判判判的判的判到比我们为10的音路。