Realistic reconstruction of two hands interacting with objects is a new and challenging problem that is essential for building personalized Virtual and Augmented Reality environments. Graph Convolutional networks (GCNs) allow for the preservation of the topologies of hands poses and shapes by modeling them as a graph. In this work, we propose the THOR-Net which combines the power of GCNs, Transformer, and self-supervision to realistically reconstruct two hands and an object from a single RGB image. Our network comprises two stages; namely the features extraction stage and the reconstruction stage. In the features extraction stage, a Keypoint RCNN is used to extract 2D poses, features maps, heatmaps, and bounding boxes from a monocular RGB image. Thereafter, this 2D information is modeled as two graphs and passed to the two branches of the reconstruction stage. The shape reconstruction branch estimates meshes of two hands and an object using our novel coarse-to-fine GraFormer shape network. The 3D poses of the hands and objects are reconstructed by the other branch using a GraFormer network. Finally, a self-supervised photometric loss is used to directly regress the realistic textured of each vertex in the hands' meshes. Our approach achieves State-of-the-art results in Hand shape estimation on the HO-3D dataset (10.0mm) exceeding ArtiBoost (10.8mm). It also surpasses other methods in hand pose estimation on the challenging two hands and object (H2O) dataset by 5mm on the left-hand pose and 1 mm on the right-hand pose.
翻译:与天体互动的两只手的现实重建是一个具有挑战性的新问题,对于建立个性化虚拟和增强现实环境来说,这是一个具有挑战性的新问题。 图形革命网络( GCNs) 能够通过将手的形状和形状建模成图形来保存其表层和形状。 在此工作中, 我们提议THOR- Net, 将GCNs、 变换器和自我监督视野的力量结合起来, 以现实地重建两只手和来自一个单一 RGB 图像的一个对象。 我们的网络由两个阶段组成: 即 提取阶段和重建阶段。 在特征提取阶段, 一个 Keypoint RCNNN 用于提取 2D 立体、 地图、 热图和 从单体 RGB 图像中捆绑框框框。 之后, 这个 2M 信息以两个图形建模为模型, 并传递到重建阶段的两个分支。 重建处估计两只手的缩图和对象的缩图。 3D 手的构成由另一分支用GraformerD 网络用一个直对目标进行重建。 。 最后的图中, 将每手用一个直对方向的缩数据 。 。 在正态的图中, 直对正反方向的图中, 。