A deep generative model that describes human motions can benefit a wide range of fundamental computer vision and graphics tasks, such as providing robustness to video-based human pose estimation, predicting complete body movements for motion capture systems during occlusions, and assisting key frame animation with plausible movements. In this paper, we present a method for learning complex human motions independent of specific tasks using a combined global and local latent space to facilitate coarse and fine-grained modeling. Specifically, we propose a hierarchical motion variational autoencoder (HM-VAE) that consists of a 2-level hierarchical latent space. While the global latent space captures the overall global body motion, the local latent space enables to capture the refined poses of the different body parts. We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation, motion completion from partial observations, and motion synthesis from sparse key-frames. Even though, our model has not been trained for any of these tasks specifically, it provides superior performance than task-specific alternatives. Our general-purpose human motion prior model can fix corrupted human body animations and generate complete movements from incomplete observations.
翻译:描述人类运动的深层基因模型可以有利于广泛的基本计算机视觉和图形任务,例如为基于视频的人类表面估计提供稳健性,预测在封闭期间运动捕捉系统的完整身体运动,以及协助关键框架动画,在本文件中,我们提出一种方法,用以学习复杂的人类运动,而与具体任务无关,同时使用全球和地方的混合潜伏空间,以便利粗糙和细微的建模。具体地说,我们提议了一种由2级等级潜伏空间构成的等级运动自动变异器(HM-VAE),全球潜伏空间捕捉了整个全球身体运动,而地方潜伏空间则能够捕捉到不同身体部分的精细的外形。我们展示了我们等级运动变形自动电动在各种任务中的有效性,包括基于视频的人姿势估计、部分观测完成运动以及分散的关键框架的动作合成。尽管我们的模型没有受过任何具体任务的培训,但是它提供优于任务特定的替代方法。我们的普通人类运动之前的模型可以修复腐坏的人体身体动动画,并且从不完整的观察中产生完整的运动。