We propose a method for object-aware 3D egocentric pose estimation that tightly integrates kinematics modeling, dynamics modeling, and scene object information. Unlike prior kinematics or dynamics-based approaches where the two components are used disjointly, we synergize the two approaches via dynamics-regulated training. At each timestep, a kinematic model is used to provide a target pose using video evidence and simulation state. Then, a prelearned dynamics model attempts to mimic the kinematic pose in a physics simulator. By comparing the pose instructed by the kinematic model against the pose generated by the dynamics model, we can use their misalignment to further improve the kinematic model. By factoring in the 6DoF pose of objects (e.g., chairs, boxes) in the scene, we demonstrate for the first time, the ability to estimate physically-plausible 3D human-object interactions using a single wearable camera. We evaluate our egocentric pose estimation method in both controlled laboratory settings and real-world scenarios.
翻译:我们提出一个目标觉知 3D 自我偏心的外观估计方法, 将运动学模型、 动态模型和场景物体信息紧密结合。 与先前的运动学或动态法方法不同, 我们通过动态调控训练, 将这两种方法协同起来。 每一步, 一个运动模型用来利用视频证据和模拟状态来提供目标姿势。 然后, 一个先学的动态模型试图在物理模拟器中模仿运动姿势。 通过比较运动模型所指示的姿势和动态模型所生成的姿势, 我们可以用它们之间的不匹配来进一步改进运动模式。 通过将现场的物体( 如椅子、盒子) 6DoF 姿势作为因素, 我们第一次展示了使用单一可磨损相机来估计物理可容的 3D 人类对象相互作用的能力。 我们评估了在受控实验室环境以及现实世界情景中自我中心姿势的估测法。