We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating 3D human motion and shape from dynamic observations, recovering plausible pose sequences in the presence of noise and occlusions remains a challenge. For this purpose, we propose an expressive generative model in the form of a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. Furthermore, we introduce a flexible optimization-based approach that leverages HuMoR as a motion prior to robustly estimate plausible pose and shape from ambiguous observations. Through extensive evaluations, we demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset, and enables motion reconstruction from multiple input modalities including 3D keypoints and RGB(-D) videos.
翻译:我们引入了“三维人类运动模型 ” : “ 三维人类运动模型 ”, 用于对时间形态和形状进行强力估测。尽管在根据动态观测估计三维人类运动和形状方面已经取得了显著进展,但在噪音和隔热条件下恢复了可信的构成序列仍是一项挑战。 为此,我们以有条件的变异自动电解码器的形式提出一个直观的基因化模型,该模型了解运动序列每个步骤的形态变化分布。此外,我们引入了一种灵活优化法,在对模糊的观察结果进行可靠估计之前,将HuMOR作为运动的动因动动和形状。通过广泛的评估,我们证明我们的模型在大型运动捕捉数据集器培训后,将各种运动和身体形状概括化,并能够从3D关键点和RGB(-D)视频等多种输入模式中进行运动重建。