We propose a method to reconstruct global human trajectories from videos in the wild. Our optimization method decouples the camera and human motion, which allows us to place people in the same world coordinate frame. Most existing methods do not model the camera motion; methods that rely on the background pixels to infer 3D human motion usually require a full scene reconstruction, which is often not possible for in-the-wild videos. However, even when existing SLAM systems cannot recover accurate scene reconstructions, the background pixel motion still provides enough signal to constrain the camera motion. We show that relative camera estimates along with data-driven human motion priors can resolve the scene scale ambiguity and recover global human trajectories. Our method robustly recovers the global 3D trajectories of people in challenging in-the-wild videos, such as PoseTrack. We quantify our improvement over existing methods on 3D human dataset Egobody. We further demonstrate that our recovered camera scale allows us to reason about motion of multiple people in a shared coordinate frame, which improves performance of downstream tracking in PoseTrack. Code and video results can be found at https://vye16.github.io/slahmr.
翻译:我们建议了一种从野生视频中重建全球人类轨迹的方法。 我们的优化方法将相机和人类运动分解出来, 从而使我们能够将人们置于同一个世界协调框架。 大多数现有方法并不模拟相机运动; 依靠背景像素将3D人类运动推导成3D人类运动的方法通常需要全面的场面重建, 而对于在野生视频来说,这往往是不可能的。 然而, 即使现有的 SLAM 系统无法恢复准确的场景重建, 背景像素运动仍然提供了足够的信号来限制相机运动。 我们显示, 相对的相机估计和数据驱动的人类运动前奏可以解决场景的模糊性, 并恢复全球人类轨迹。 我们的方法有力地恢复了全球3D人行迹, 以挑战现场视频视频, 比如 PoseTrack。 我们量化了我们在3D 人类数据集 Egobody 上的现有方法的改进程度。 我们进一步证明, 我们回收的相机规模允许我们在一个共同的坐标框中思考多个人的动作, 从而改善PoseTrack的下游跟踪功能 。 代码和视频结果可以在 http:// labrbs.</s>