Estimating 3D interacting hand pose from a single RGB image is essential for understanding human actions. Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately. In this way, it is straightforward to take advantage of the latest research progress on the single-hand pose estimation system. However, hand pose estimation in interacting scenarios is very challenging, due to (1) severe hand-hand occlusion and (2) ambiguity caused by the homogeneous appearance of hands. To tackle these two challenges, we propose a novel Hand De-occlusion and Removal (HDR) framework to perform hand de-occlusion and distractor removal. We also propose the first large-scale synthetic amodal hand dataset, termed Amodal InterHand Dataset (AIH), to facilitate model training and promote the development of the related research. Experiments show that the proposed method significantly outperforms previous state-of-the-art interacting hand pose estimation approaches. Codes and data are available at https://github.com/MengHao666/HDR.
翻译:从单一的 RGB 图像中估计 3D 互动手的外观是理解人类行动的关键。与以前直接预测两只互动手的外观的三维外观的多数工作不同,我们提议将挑战性互动手的外观分解为估计任务,并分别估计每只手的外观。这样,利用单一手的外观估计系统的最新研究进展是直截了当的。然而,在互动假设中,手的外观是极具挑战性的,因为:(1) 手的隔离严重严重,(2) 因手的模棱两可造成模棱两可。为了应对这两个挑战,我们提议了一个新的手解异和移离(HDR)框架,以实施手解分解和转移分心器。我们还提议了第一个大型合成合成手数据集,称为Amodal InterHand数据集(AIH),以便利示范培训,促进相关研究的发展。实验显示,拟议的方法大大超出先前的状态互动手的外观的估计方法。我们提议了这两个挑战。在 https://github.com/MengHO666/HDRDR 中提供代码和数据。