We present a new multi-stream 3D mesh reconstruction network (MSMR-Net) for hand pose estimation from a single RGB image. Our model consists of an image encoder followed by a mesh-convolution decoder composed of connected graph convolution layers. In contrast to previous models that form a single mesh decoding path, our decoder network incorporates multiple cross-resolution trajectories that are executed in parallel. Thus, global and local information are shared to form rich decoding representations at minor additional parameter cost compared to the single trajectory network. We demonstrate the effectiveness of our method in hand-hand and hand-object interaction scenarios at various levels of interaction. To evaluate the former scenario, we propose a method to generate RGB images of closely interacting hands. Moreoever, we suggest a metric to quantify the degree of interaction and show that close hand interactions are particularly challenging. Experimental results show that the MSMR-Net outperforms existing algorithms on the hand-object FreiHAND dataset as well as on our own hand-hand dataset.
翻译:我们为手提出了一个新的多流 3D 网形重建网络( MSMR- Net), 从一个 RGB 图像中进行估计。 我们的模型由图像编码器组成, 由一个由相连接的图形变相层组成的网形变异解码器组成。 与以前形成单一网形解码路径的模型相比, 我们的解码器网络包含多个平行执行的跨分辨率轨迹。 因此, 全球和地方信息共享, 形成丰富的解码显示, 与单一轨迹网络相比, 其额外参数成本比小得多。 我们展示了我们在不同互动层次的手动和手动物体互动情景的有效性。 为了评估前一方案, 我们提出了生成密切交互手动的 RGB 图像的方法 。 Moreoever, 我们建议了一种量化互动程度的尺度, 并显示亲手互动特别具有挑战性。 实验结果表明, MSMMR- 网比手写FreHAND 数据集上的现有算法要高得多。