We present a novel learning-based modal sound synthesis approach that includes a mixed vibration solver for modal analysis and an end-to-end sound radiation network for acoustic transfer. Our mixed vibration solver consists of a 3D sparse convolution network and a Locally Optimal Block Preconditioned Conjugate Gradient module (LOBPCG) for iterative optimization. Moreover, we highlight the correlation between a standard modal vibration solver and our network architecture. Our radiation network predicts the Far-Field Acoustic Transfer maps (FFAT Maps) from the surface vibration of the object. The overall running time of our learning method for any new object is less than one second on a GTX 3080 Ti GPU while maintaining a high sound quality close to the ground truth that is computed using standard numerical methods. We also evaluate the numerical accuracy and perceptual accuracy of our sound synthesis approach on different objects corresponding to various materials.
翻译:我们提出了一个基于学习的新模式声音合成方法,其中包括用于模型分析的混合振动求解器和用于声学传输的端到端的声频辐射网络。我们的混合振动求解器由3D稀疏的电动网络和一个本地最佳区块预设共控梯级模块(LOBPCG)组成,供迭接优化使用。此外,我们强调标准模式振动求解器和网络结构之间的相互关系。我们的辐射网络从物体表面振动中预测远野声波传导图(FFAT地图)。我们对任何新物体的学习方法的总体运行时间在GTX 3080 Ti GPU上不到1秒,同时在接近地面事实时保持高声质量,而地面真相是使用标准的数字方法计算的。我们还评估了我们在不同与各种材料相关的物体上声音合成方法的数字准确性和感知性精度。