Binaural rendering of ambisonic signals is of broad interest to virtual reality and immersive media. Conventional methods often require manually measured Head-Related Transfer Functions (HRTFs). To address this issue, we collect a paired ambisonic-binaural dataset and propose a deep learning framework in an end-to-end manner. Experimental results show that neural networks outperform the conventional method in objective metrics and achieve comparable subjective metrics. To validate the proposed framework, we experimentally explore different settings of the input features, model structures, output features, and loss functions. Our proposed system achieves an SDR of 7.32 and MOSs of 3.83, 3.58, 3.87, 3.58 in quality, timbre, localization, and immersion dimensions.
翻译:闪度信号的二进制转换对虚拟现实和浸泡媒体具有广泛的兴趣。常规方法往往需要人工测量与头有关的转移功能。为了解决这一问题,我们收集了对齐的双双双双双双双双双双双双双双立体数据集,并以端对端方式提出深学习框架。实验结果显示神经网络在客观指标方面优于常规方法,并达到可比的主观度量。为了验证拟议框架,我们实验探索了输入特征、模型结构、输出特征和损失功能的不同环境。我们提议的系统在质量、轮胎、本地化和浸入方面达到7.32特别提款权,MOS达到3.83、3.58、3.87、3.58。