In this paper, we introduce a method to automatically reconstruct the 3D motion of a person interacting with an object from a single RGB video. Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces exerted on the human body. The main contributions of this work are three-fold. First, we introduce an approach to jointly estimate the motion and the actuation forces of the person on the manipulated object by modeling contacts and the dynamics of the interactions. This is cast as a large-scale trajectory optimization problem. Second, we develop a method to automatically recognize from the input video the 2D position and timing of contacts between the person and the object or the ground, thereby significantly simplifying the complexity of the optimization. Third, we validate our approach on a recent video+MoCap dataset capturing typical parkour actions, and demonstrate its performance on a new dataset of Internet videos showing people manipulating a variety of tools in unconstrained environments.
翻译:在本文中, 我们引入了一种方法, 从一个 RGB 视频中自动重建一个人与对象发生互动的 3D 动作。 我们的方法估算了一个人与对象构成的 3D 姿势, 接触位置和接触力。 这项工作的主要贡献是三重的。 首先, 我们引入了一种方法, 通过模拟联系人和互动的动态, 来共同估计被操纵对象上的人的运动和动作力。 这被描绘成一个大规模轨道优化问题 。 其次, 我们开发了一种方法, 从输入视频中自动识别 人与对象或地面之间接触的 2D 姿势和时间, 从而大大简化了优化的复杂程度 。 第三, 我们验证了我们最近录制的视频+ Mocap 数据集, 捕捉典型的园艺动作, 并展示其在新的互联网视频数据集上的表现, 显示人们在不受控制的环境中操纵各种工具 。