SmartMocap:使用未经校准的 RGB 相机对人和相机动作进行联合估计 (SmartMocap: Joint Estimation of Human and Camera Motion using Uncalibrated RGB Cameras)

Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem. Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system. The calibration step has to be done a priori for every capture session, which is a tedious process, and re-calibration is required whenever cameras are intentionally or accidentally moved. In this paper, we propose a mocap method which uses multiple static and moving extrinsically uncalibrated RGB cameras. The key components of our method are as follows. First, since the cameras and the subject can move freely, we select the ground plane as a common reference to represent both the body and the camera motions unlike existing methods which represent bodies in the camera coordinate. Second, we learn a probability distribution of short human motion sequences ($\sim$1sec) relative to the ground plane and leverage it to disambiguate between the camera and human motion. Third, we use this distribution as a motion prior in a novel multi-stage optimization approach to fit the SMPL human body model and the camera poses to the human body keypoints on the images. Finally, we show that our method can work on a variety of datasets ranging from aerial cameras to smartphones. It also gives more accurate results compared to the state-of-the-art on the task of monocular human mocap with a static camera. Our code is available for research purposes on https://github.com/robot-perception-group/SmartMocap.

翻译：由多个 RGB 相机拍摄的无标记的人类运动( 摩普) 是一个广泛研究的问题。现有的方法要么需要校准相机, 要么需要校准相机, 要么将其与静态相机相对, 以静态相机为参照框架, 以静态相机为参照框架。校准步骤必须针对每个捕捉会话先验, 这是一个无聊的过程, 当相机有意或意外移动时, 需要重新校正。在本文中, 我们建议一种摩普方法, 使用多个静态相机, 并移动超静的 RGB 相机。我们的方法的关键组成部分如下。首先, 由于相机和对象可以自由移动, 我们选择地面平板机作为共同的参照, 以代表身体和相机运动的动作, 不同于在相机坐标坐标坐标坐标坐标坐标坐标坐标上的现有方法。我们学习了人类运动的短动作序列的分布, 利用它来分解相机和人类运动之间的距离。第三, 我们使用这种分配方式之前的多台优化方法来适应 SMPL 人体的模型模型和摄像头的智能图像, 也可以在人类的图像上显示一个更精确的图像。在人类运动中, 我们的缩图中, 将一个可以将一个比。。在人类运动的游戏/ 将一个移动的游戏的图将一个比。