In the growing field of virtual auditory display, personalized head-related transfer functions (HRTFs) play a vital role in establishing an accurate sound image. In this work, we propose an HRTF personalization method employing convolutional neural networks (CNN) to predict a subject's HRTFs for all directions from their scanned head geometry. To ease the training of the CNN models, we propose novel pre-processing methods for both the head scans and HRTF data to achieve compact representations. For the head scan, we use truncated spherical cap harmonic (SCH) coefficients to represent the pinna area, which is important in the acoustic scattering process. For the HRTF data, we use truncated spherical harmonic (SH) coefficients to represent the HRTF magnitudes and onsets. One CNN model is trained to predict the SH coefficients of the HRTF magnitudes from the SCH coefficients of the scanned ear geometry and other anthropometric measurements of the head. The other CNN model is trained to predict SH coefficients of the HRTF onsets from only the anthropometric measurements of the ear, head, and torso. Combining the magnitude and onset predictions, our method is able to predict the complete and global HRTF data. A leave-one-out validation with the log-spectral distortion (LSD) metric is used for objective evaluation. The results show a decent LSD level at both spatial \& temporal dimensions compared to the ground-truth HRTFs and a lower LSD than the boundary element method (BEM) simulation of HRTFs that the database provides. The localization simulation results with an auditory model are also consistent with the objective evaluation metrics, showing the localization responses with our predicted HRTFs are significantly better than with the BEM calculated ones.
翻译:在不断增长的虚拟听觉显示领域,个性化头部相关传输功能(HRTF)在建立准确的正确图像方面发挥着关键作用。在这项工作中,我们提议了HRTF个性化方法,使用进听神经神经网络(CNN)来预测其扫描头部几何的所有方向。为了便利CNN模型的培训,我们为头部扫描和HRTF数据提出了新的预处理方法,以达到缩略表。在头部扫描中,我们使用直流的流经球口腔口腔调调系数来代表点菜区域,这在听觉散布过程中非常重要。对于HRTF数据,我们使用快速计算球口腔调系数来预测对象的HRTF。一个CNM模型用扫描式耳眼测量和其他头部的光度测量来预测SH系数。另一个CNN模型用来预测 HRTF的 SH系数, 仅用人类心部内位数的内值数据, 也用智能内径端的内存数据, 也用智能内存数据, 向右端的内端数据显示一个更精确的内空数据。