Monocular 3D human pose estimation from a single RGB image has received a lot attentions in the past few year. Pose inference models with competitive performance however require supervision with 3D pose ground truth data or at least known pose priors in their target domain. Yet, these data requirements in many real-world applications with data collection constraints may not be achievable. In this paper, we present a heuristic weakly supervised solution, called HW-HuP to estimate 3D human pose in contexts that no ground truth 3D data is accessible, even for fine-tuning. HW-HuP learns partial pose priors from public 3D human pose datasets and uses easy-to-access observations from the target domain to iteratively estimate 3D human pose and shape in an optimization and regression hybrid cycle. In our design, depth data as an auxiliary information is employed as weak supervision during training, yet it is not needed for the inference. We evaluate HW-HuP performance qualitatively on datasets of both in-bed human and infant poses, where no ground truth 3D pose is provided neither any target prior. We also test HW-HuP performance quantitatively on a publicly available motion capture dataset against the 3D ground truth. HW-HuP is also able to be extended to other input modalities for pose estimation tasks especially under adverse vision conditions, such as occlusion or full darkness. On the Human3.6M benchmark, HW-HuP shows 104.1mm in MPJPE and 50.4mm in PA MPJPE, comparable to the existing state-of-the-art approaches that benefit from full 3D pose supervision.
翻译:从一个 RGB 图像中测得的3D 人体表面估计值在过去的几年中受到了很多关注。 具有竞争性性能的高频推断模型需要3D 的监控, 显示地面真实数据或至少已知的在目标领域具有前置位置。 然而, 在许多具有数据收集限制的现实世界应用中, 这些数据要求可能无法实现。 在本文中, 我们提出了一个超弱的薄弱监管解决方案, 称为 HW- HuP, 以估计3D 人类表面的3D 数据, 即使是为微调。 HW- HuP 从公开的 3D 人类表面数据集中部分构成前缀, 利用目标领域容易获得的观察, 以优化和回归混合周期的方式迭接地估计 3D 人类的形状和形状。 在我们的设计中,作为辅助信息的深度数据被作为薄弱的监管手段使用, 但不需要用于推断。 我们评估在嵌入的人体和婴儿基准数据集中的HW 3P 3D 显示, 3D 显示的地面真相前没有提供任何目标, 也测试 HU-HM 数据为可获取的完全的 HW 。