Estimating human motion from video is an active research area due to its many potential applications. Most state-of-the-art methods predict human shape and posture estimates for individual images and do not leverage the temporal information available in video. Many "in the wild" sequences of human motion are captured by a moving camera, which adds the complication of conflated camera and human motion to the estimation. We therefore present BodySLAM, a monocular SLAM system that jointly estimates the position, shape, and posture of human bodies, as well as the camera trajectory. We also introduce a novel human motion model to constrain sequential body postures and observe the scale of the scene. Through a series of experiments on video sequences of human motion captured by a moving monocular camera, we demonstrate that BodySLAM improves estimates of all human body parameters and camera poses when compared to estimating these separately.
翻译:从视频中估计人类运动是一个积极的研究领域,因为它有许多潜在的应用。大多数最先进的方法预测个人图像的人类形状和姿势估计,而没有利用视频中现有的时间信息。许多“野生”人类运动的序列被移动的相机捕获,这增加了混合相机和人类运动的复杂程度。因此,我们展示了“身体SLAM”系统,这是一个单镜SLAM系统,它共同估计人体身体的位置、形状和姿势,以及相机的轨迹。我们还引入了一个新的人类运动模型,以限制连续身体的姿势并观察场景的规模。通过一系列关于移动的单镜相机所捕捉到的人类运动的视频序列的实验,我们证明“身体SLAM”改进了所有人体参数和摄像器的估计数,与分别估计这些参数和摄像器的估计数相比较。