Estimating the pose and shape of hands and objects under interaction finds numerous applications including augmented and virtual reality. Existing approaches for hand and object reconstruction require explicitly defined physical constraints and known objects, which limits its application domains. Our algorithm is agnostic to object models, and it learns the physical rules governing hand-object interaction. This requires automatically inferring the shapes and physical interaction of hands and (potentially unknown) objects. We seek to approach this challenging problem by proposing a collaborative learning strategy where two-branches of deep networks are learning from each other. Specifically, we transfer hand mesh information to the object branch and vice versa for the hand branch. The resulting optimisation (training) problem can be unstable, and we address this via two strategies: (i) attention-guided graph convolution which helps identify and focus on mutual occlusion and (ii) unsupervised associative loss which facilitates the transfer of information between the branches. Experiments using four widely-used benchmarks show that our framework achieves beyond state-of-the-art accuracy in 3D pose estimation, as well as recovers dense 3D hand and object shapes. Each technical component above contributes meaningfully in the ablation study.
翻译:估计相互作用下的手和物体的形状和形状,发现许多应用,包括扩大和虚拟现实。现有的手和物体重建方法需要明确界定的物理限制和已知物体,从而限制其应用领域。我们的算法对物体模型是不可知的,它学习关于手和物体相互作用的物理规则。这要求自动推断手和(可能未知的)物体的形状和物理相互作用。我们试图通过提出合作学习战略来应对这个具有挑战性的问题,其中深深层网络的两层正在相互学习。具体地说,我们把手网格信息传给对象分支,而手分支则反向转移。由此产生的优化(训练)问题可能不稳定,我们通过两种战略解决这个问题:(一) 帮助确定和关注相互隔离和(二) 不受监督的连带损失,从而便利各分支之间的信息转让。使用四种广泛使用的基准进行的实验表明,我们的框架在3D组合估计中超越了状态的精确度,并且恢复了3D形和物体形状的密度。每个技术组件都有助于进行有意义的研究。