3D human pose estimation from a monocular video has recently seen significant improvements. However, most state-of-the-art methods are kinematics-based, which are prone to physically implausible motions with pronounced artifacts. Current dynamics-based methods can predict physically plausible motion but are restricted to simple scenarios with static camera view. In this work, we present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera. D&D introduces inertial force control (IFC) to explain the 3D human motion in the non-inertial local frame by considering the inertial forces of the dynamic camera. To learn the ground contact with limited annotations, we develop probabilistic contact torque (PCT), which is computed by differentiable sampling from contact probabilities and used to generate motions. The contact state can be weakly supervised by encouraging the model to generate correct motions. Furthermore, we propose an attentive PD controller that adjusts target pose states using temporal information to obtain smooth and accurate pose control. Our approach is entirely neural-based and runs without offline optimization or simulation in physics engines. Experiments on large-scale 3D human motion benchmarks demonstrate the effectiveness of D&D, where we exhibit superior performance against both state-of-the-art kinematics-based and dynamics-based methods. Code is available at https://github.com/Jeffsjtu/DnD
翻译:3D 人造图象的估算最近有了显著的改善。 然而,大多数最先进的方法都以运动力为基础,这些方法很容易在物理上无法令人相信的动作上与直观的文物发生物理变化。 以动态为基础的当前方法可以预测物理上看似合理的运动,但仅限于以静态摄像视图的简单情景。 在这项工作中,我们介绍D&D(从动态相机中学习人类动态),它利用物理法则,用移动的相机来重建三D人运动。D&D引入了惯性力控制(IFC),以解释三D人运动在非不光化的地方框架中的动作。考虑到动态相机的惯性力量。用有限的说明来学习地面接触,我们开发概率性能接触(PCT),这是根据接触概率的不同取样法进行计算,并用来产生运动。通过鼓励基于模型来产生正确的动作,可以对接触状态进行微弱的监督。 此外,我们提议一个关注的PD控制器,利用时间信息来调整目标国家,在不光滑滑和精确的图像定位上,在可变现式的机基中进行。 我们的Siral- droalalalal- droal- 方法是完全地展示。