Social presence, the feeling of being there with a real person, will fuel the next generation of communication systems driven by digital humans in virtual reality (VR). The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models. However, these PS models are time-consuming to build and are typically trained with limited data variability, which results in poor generalization and robustness. Major sources of variability that affects the accuracy of facial expression transfer algorithms include using different VR headsets (e.g., camera configuration, slop of the headset), facial appearance changes over time (e.g., beard, make-up), and environmental factors (e.g., lighting, backgrounds). This is a major drawback for the scalability of these models in VR. This paper makes progress in overcoming these limitations by proposing an end-to-end multi-identity architecture (MIA) trained with specialized augmentation strategies. MIA drives the shape component of the avatar from three cameras in the VR headset (two eyes, one mouth), in untrained subjects, using minimal personalized information (i.e., neutral 3D mesh shape). Similarly, if the PS texture decoder is available, MIA is able to drive the full avatar (shape+texture) robustly outperforming PS models in challenging scenarios. Our key contribution to improve robustness and generalization, is that our method implicitly decouples, in an unsupervised manner, the facial expression from nuisance factors (e.g., headset, environment, facial appearance). We demonstrate the superior performance and robustness of the proposed method versus state-of-the-art PS approaches in a variety of experiments.
翻译:社会存在,即与真实人在一起的感觉,将刺激由虚拟现实中的数字人驱动的数字人驱动的下一代通信系统(VR)。最佳的3D视频现实 VR 动因(3D视频现实VVVatars),这种能最大限度地减少异常效应的3D视频现实 VR 动因取决于个人特异(PS)模型。然而,这些PS模型需要花费时间来构建,而且通常经过有限的数据变异性培训,从而导致一般化和稳健性。影响面部表达转换算法准确性的主要变异性来源包括使用不同的 VR 信头(例如相机配置、头部缩略图)、 脸部随时间变化变化(例如胡子、造型) 以及环境因素的变异性因素(例如灯光、背景) 。这是这些模型在 VR的变异性模型的伸缩性方面的一大缺陷。 IMA将三个图像的形状从VR头部(双眼、一个嘴部)、 面部的变异性变异性图像(如果是完全的缩式,则以个人变现的方式) 显示, 个人的动作。