Estimating Head-Related Transfer Functions (HRTFs) of arbitrary source points is essential in immersive binaural audio rendering. Computing each individual's HRTFs is challenging, as traditional approaches require expensive time and computational resources, while modern data-driven approaches are data-hungry. Especially for the data-driven approaches, existing HRTF datasets differ in spatial sampling distributions of source positions, posing a major problem when generalizing the method across multiple datasets. To alleviate this, we propose a deep learning method based on a novel conditioning architecture. The proposed method can predict an HRTF of any position by interpolating the HRTFs of known distributions. Experimental results show that the proposed architecture improves the model's generalizability across datasets with various coordinate systems. Additional demonstrations show that the model robustly reconstructs the target HRTFs from the spatially downsampled HRTFs in both quantitative and perceptual measures.
翻译:估算任意源点与头有关的转移功能(HRTF)对于浸泡二进制音频转换至关重要。计算每个人的HRTF具有挑战性,因为传统方法需要昂贵的时间和计算资源,而现代数据驱动方法则是数据饥饿。特别是对于数据驱动方法而言,现有的HRTF数据集在源位置的空间抽样分布上存在差异,在将该方法推广到多个数据集时构成一个重大问题。为了缓解这一问题,我们提议了一种基于新式调控结构的深层次学习方法。拟议方法可以通过对已知分布的HRTF进行内插来预测任何位置的HRTF。实验结果显示,拟议的结构改善了模型在各种协调系统下集数据集之间的通用性。其他演示表明,模型在数量和感知性测量中都强有力地将目标HRTF从空间下标的HRTF重建出来。