We present a self-trainable method, Mask2Hand, which learns to solve the challenging task of predicting 3D hand pose and shape from a 2D binary mask of hand silhouette/shadow without additional manually-annotated data. Given the intrinsic camera parameters and the parametric hand model in the camera space, we adopt the differentiable rendering technique to project 3D estimations onto the 2D binary silhouette space. By applying a tailored combination of losses between the rendered silhouette and the input binary mask, we are able to integrate the self-guidance mechanism into our end-to-end optimization process for constraining global mesh registration and hand pose estimation. The experiments show that our method, which takes a single binary mask as the input, can achieve comparable prediction accuracy on both unaligned and aligned settings as state-of-the-art methods that require RGB or depth inputs. Our code is available at https://github.com/lijenchang/Mask2Hand.
翻译:我们展示了一种可自我学习的方法,Mask2Hand, 它学会了解决从手的二维双向面罩中预测3D手的形状和形状的艰巨任务,没有额外的人工附加说明的数据。根据摄像空间的内在摄像参数和参数手模型,我们采用了不同的演化技术,将3D估计投射到2D双向侧面罩上。通过在提供的小休和输入的双向面罩之间应用量身定制的损失组合,我们能够将自我指导机制纳入我们的端到端优化进程,以限制全球网目登记和手表面估计。实验显示,我们的方法,以单一的双向面罩作为输入,可以在不相容和对齐的设置上实现可比的预测准确性,作为需要 RGB 或深度输入的艺术状态方法。我们的代码可以在 https://github.com/lijenchang/Mask2Hand上查阅。