Reconstructing two-hand interactions from a single image is a challenging problem due to ambiguities that stem from projective geometry and heavy occlusions. Existing methods are designed to estimate only a single pose, despite the fact that there exist other valid reconstructions that fit the image evidence equally well. In this paper we propose to address this issue by explicitly modeling the distribution of plausible reconstructions in a conditional normalizing flow framework. This allows us to directly supervise the posterior distribution through a novel determinant magnitude regularization, which is key to varied 3D hand pose samples that project well into the input image. We also demonstrate that metrics commonly used to assess reconstruction quality are insufficient to evaluate pose predictions under such severe ambiguity. To address this, we release the first dataset with multiple plausible annotations per image called MultiHands. The additional annotations enable us to evaluate the estimated distribution using the maximum mean discrepancy metric. Through this, we demonstrate the quality of our probabilistic reconstruction and show that explicit ambiguity modeling is better-suited for this challenging problem.
翻译:从单一图像中重建双手互动是一个具有挑战性的问题,因为投影几何学和严重隔断造成模糊不清,因此这是一个棘手的问题。现有方法的设计只用来估计一个成份,尽管事实上存在其他符合图像证据的有效重建。在本文件中,我们提议解决这一问题,办法是在有条件的正常流动框架内明确模拟合理重建的分布模式。这使我们能够通过一种新型的决定性程度规范直接监督后部的分布,这是各种三维手将样本纳入输入图像的关键。我们还表明,通常用来评估重建质量的指标不足以在如此模糊的情况下评估预测。为了解决这个问题,我们发布了第一个数据集,每个图像都有多个可信的说明,称为多Hands。补充说明使我们能够使用最大平均值差异度来评估估计的分布。我们通过这个说明来展示了我们概率重建的质量,并表明明确的模糊性模型更适合这一具有挑战性的问题。