This paper addresses the problem of 3D hand pose estimation from a monocular RGB image. While previous methods have shown great success, the structure of hands has not been fully exploited, which is critical in pose estimation. To this end, we propose a regularized graph representation learning under a conditional adversarial learning framework for 3D hand pose estimation, aiming to capture structural inter-dependencies of hand joints. In particular, we estimate an initial hand pose from a parametric hand model as a prior of hand structure, which regularizes the inference of the structural deformation in the prior pose for accurate graph representation learning via residual graph convolution. To optimize the hand structure further, we propose two bone-constrained loss functions, which characterize the morphable structure of hand poses explicitly. Also, we introduce an adversarial learning framework conditioned on the input image with a multi-source discriminator, which imposes the structural constraints onto the distribution of generated 3D hand poses for anthropomorphically valid hand poses. Extensive experiments demonstrate that our model sets the new state-of-the-art in 3D hand pose estimation from a monocular image on five standard benchmarks.
翻译:本文从单面 RGB 图像中探讨三维手构成的估计问题。 虽然以前的方法已经表现出巨大的成功, 手的结构还没有被充分利用, 这在估计中至关重要。 为此, 我们提议在三维手的有条件对抗性学习框架内, 在三维手构成估计, 目的是捕捉手关节的结构性相互依存关系。 特别是, 我们估计一对准手模型最初的手结构, 将前手结构之前的对前方结构变形的推断规范化, 以便通过残余图解变异来进行准确的图形表达学习。 为了进一步优化手结构, 我们提议了两个骨部受限制的损失功能, 其特征是手部变形结构, 明确显示手部结构。 另外, 我们引入了一个以多源歧视器输入图像为条件的对抗性学习框架, 将生成的三维手姿势的分布在结构上受到结构性限制。 广泛的实验表明, 我们的模式在三维手部的新状态下根据五个标准基准的单面图像做出估计。