基于视频的抓取运动的学习变动动议前 (Learning Variational Motion Prior for Video-based Motion Capture)

Motion capture from a monocular video is fundamental and crucial for us humans to naturally experience and interact with each other in Virtual Reality (VR) and Augmented Reality (AR). However, existing methods still struggle with challenging cases involving self-occlusion and complex poses due to the lack of effective motion prior modeling. In this paper, we present a novel variational motion prior (VMP) learning approach for video-based motion capture to resolve the above issue. Instead of directly building the correspondence between the video and motion domain, We propose to learn a generic latent space for capturing the prior distribution of all natural motions, which serve as the basis for subsequent video-based motion capture tasks. To improve the generalization capacity of prior space, we propose a transformer-based variational autoencoder pretrained over marker-based 3D mocap data, with a novel style-mapping block to boost the generation quality. Afterward, a separate video encoder is attached to the pretrained motion generator for end-to-end fine-tuning over task-specific video datasets. Compared to existing motion prior models, our VMP model serves as a motion rectifier that can effectively reduce temporal jittering and failure modes in frame-wise pose estimation, leading to temporally stable and visually realistic motion capture results. Furthermore, our VMP-based framework models motion at sequence level and can directly generate motion clips in the forward pass, achieving real-time motion capture during inference. Extensive experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.

翻译：在虚拟现实(VR)和增强现实(AAR)中,我们人类自然地体验和互动,从单视视频中捕捉运动是根本和至关重要的。然而,现有方法仍然与涉及自我封闭和复杂(AR)的具有挑战性的案件挣扎,因为缺乏有效的前建模型。在本文中,我们展示了一种新型变异运动前(VMP)的学习方法,用于录相运动捕捉,以解决上述问题。我们提议在视频与运动域直接建立对应关系,而不是直接建立视频与运动域之间的对应关系。我们提议学习一个通用的潜在空间,用于捕捉所有自然运动的先前分布,作为随后视频动作动作捕获任务的基础。然而,为了提高先前空间的通用能力,我们提议了一个基于变异变异变器的自动变异器,先于基于标记的3D软体数据,先用新的风格映射块来提升生成质量。之后,一个单独的视频变异变器附在基于最终到最后框架对特定任务的视频数据集进行微调整的预设运动,与现有的移动前模型相比,我们的变异性变异变现模型和在现实运动变动模型中,在先变变变现模型中可以显示模型和直判变变现模型中显示结果结果框架上显示。