Egocentric 3D human pose estimation using a single fisheye camera has become popular recently as it allows capturing a wide range of daily activities in unconstrained environments, which is difficult for traditional outside-in motion capture with external cameras. However, existing methods have several limitations. A prominent problem is that the estimated poses lie in the local coordinate system of the fisheye camera, rather than in the world coordinate system, which is restrictive for many applications. Furthermore, these methods suffer from limited accuracy and temporal instability due to ambiguities caused by the monocular setup and the severe occlusion in a strongly distorted egocentric perspective. To tackle these limitations, we present a new method for egocentric global 3D body pose estimation using a single head-mounted fisheye camera. To achieve accurate and temporally stable global poses, a spatio-temporal optimization is performed over a sequence of frames by minimizing heatmap reprojection errors and enforcing local and global body motion priors learned from a mocap dataset. Experimental results show that our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
翻译:使用单一的鱼眼照相机进行的以地心3D为主的人类表面估计最近变得很受欢迎,因为它使得在不受限制的环境中能够捕捉到广泛的日常活动,而这种活动对传统的外部运动很难用外部照相机进行捕捉。然而,现有的方法有若干限制。一个突出的问题是,估计的构成在于鱼眼照相机的当地协调系统,而不是世界协调系统,这个系统对许多应用是限制性的。此外,由于单眼摄像头的设置和以高度扭曲的自我中心观点严重隔离造成的含糊不清,这些方法的精确性和时间性不稳定性有限。为了克服这些限制,我们提出了一种新方法,用于使用单一的头部鱼眼照相机来进行以自我为中心的全球3D身体的估测。为了实现准确和时间稳定的全球外观,通过尽可能减少热谱再预测错误和执行从一个模型中学得的当地和全球身体运动,在一系列框架上进行地心-时的优化。实验结果表明,我们的方法在定量和定性上都超越了状态方法。