We present the Geometric-Wave Acoustic (GWA) dataset, a large-scale audio dataset of about 2 million synthetic room impulse responses (IRs) and their corresponding detailed geometric and simulation configurations. Our dataset samples acoustic environments from over 6.8K high-quality diverse and professionally designed houses represented as semantically labeled 3D meshes. We also present a novel real-world acoustic materials assignment scheme based on semantic matching that uses a sentence transformer model. We compute high-quality impulse responses corresponding to accurate low-frequency and high-frequency wave effects by automatically calibrating geometric acoustic ray-tracing with a finite-difference time-domain wave solver. We demonstrate the higher accuracy of our IRs by comparing with recorded IRs from complex real-world environments. Moreover, we highlight the benefits of GWA on audio deep learning tasks such as automated speech recognition, speech enhancement, and speech separation. This dataset is the first data with accurate wave acoustic simulations in complex scenes. Codes and data are available at https://gamma.umd.edu/pro/sound/gwa.
翻译:我们展示了几何-垂直声波(GWA)数据集,这是一个由大约200万合成室脉冲反应(IRs)及其相应的详细几何和模拟配置组成的大型音频数据集。我们的数据集样本来自6.8K以上高质量、多样化和专业设计高品质的室内的声学环境,这些房屋代表了以语义标签为3Dmeshes。我们还展示了一个基于语义匹配的新颖的现实世界声学材料分配方案,它使用一个句子变异器模型。我们通过自动校准一个有限制的地平时空波解调解调器,计算出与准确的低频和高频波效应相对的高质量脉冲反应。我们通过比较复杂的现实环境中的有记录的IRs,展示了我们的IRs更高的准确性。此外,我们强调GWA在语音自动语音识别、语音增强和语音分离等听力深学任务方面的好处。这个数据集是在复杂场面有准确波声学模拟的第一个数据。代码和数据可在 https://gamma.umd.ed/propro/gwound/gwa。