Hand pose estimation is a crucial part of a wide range of augmented reality and human-computer interaction applications. Predicting the 3D hand pose from a single RGB image is challenging due to occlusion and depth ambiguities. GCN-based (Graph Convolutional Networks) methods exploit the structural relationship similarity between graphs and hand joints to model kinematic dependencies between joints. These techniques use predefined or globally learned joint relationships, which may fail to capture pose-dependent constraints. To address this problem, we propose a two-stage GCN-based framework that learns per-pose relationship constraints. Specifically, the first phase quantizes the 2D/3D space to classify the joints into 2D/3D blocks based on their locality. This spatial dependency information guides this phase to estimate reliable 2D and 3D poses. The second stage further improves the 3D estimation through a GCN-based module that uses an adaptative nearest neighbor algorithm to determine joint relationships. Extensive experiments show that our multi-stage GCN approach yields an efficient model that produces accurate 2D/3D hand poses and outperforms the state-of-the-art on two public datasets.
翻译:以GCN为基础的GCN(Graph Convolution Networks)方法利用图形和手联手之间的结构关系相似性来模拟联合体之间的动态依赖性。这些技术使用预先确定或全球学习的联合关系,可能无法捕捉受成因的制约因素。为了解决这一问题,我们提议了一个基于GCN的两阶段框架,以学习每个位置的关系限制。具体地说,第一阶段对2D/3D空间进行量化,以便根据位置将联合体分类为2D/3D区块。这种空间依赖性信息指导这一阶段估算可靠的2D和3D构成。第二阶段通过基于GCN的模块进一步改进3D的估算,该模块使用适应性最近的邻居算法来确定联合关系。广泛的实验表明,我们的多阶段GCN方法产生了一种高效模型,能够产生准确的2D/3D手姿势并超越两个公共数据集的状态。