In this paper, we consider the challenging task of simultaneously locating and recovering multiple hands from single 2D image. Previous studies either focus on single hand reconstruction or solve this problem in a multi-stage way. Moreover, the conventional two-stage pipeline firstly detects hand areas, and then estimates 3D hand pose from each cropped patch. To reduce the computational redundancy in preprocessing and feature extraction, we propose a concise but efficient single-stage pipeline. Specifically, we design a multi-head auto-encoder structure for multi-hand reconstruction, where each head network shares the same feature map and outputs the hand center, pose and texture, respectively. Besides, we adopt a weakly-supervised scheme to alleviate the burden of expensive 3D real-world data annotations. To this end, we propose a series of losses optimized by a stage-wise training scheme, where a multi-hand dataset with 2D annotations is generated based on the publicly available single hand datasets. In order to further improve the accuracy of the weakly supervised model, we adopt several feature consistency constraints in both single and multiple hand settings. Specifically, the keypoints of each hand estimated from local features should be consistent with the re-projected points predicted from global features. Extensive experiments on public benchmarks including FreiHAND, HO3D, InterHand2.6M and RHD demonstrate that our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners.
翻译:在本文中,我们考虑了同时定位和从单一 2D 图像中回收多个手的艰巨任务。 以前的研究要么侧重于单手重建,要么以多阶段的方式解决这一问题。 此外, 常规的两阶段管道首先探测手部区域, 然后估计每个裁剪的补丁点的3D 手部。 为了减少预处理和特征提取中的计算冗余, 我们提议了一个简洁而高效的单阶段管道。 具体地说, 我们为多手重建设计一个多头自动编码结构, 在每个主网络中分别共享手中央、 外观和纹质的同一特征和输出。 此外, 我们采取了一个薄弱的超前的系统计划, 以减轻昂贵的 3D 真实世界数据说明的负担。 为此, 我们提出了一系列通过分阶段培训计划优化的损失。 在一个基于公开提供的单手数据集的基础上生成一个带有 2D 说明的多手数据集。 为了进一步提高薄弱的监管模式的准确性, 我们在单手环境中和多手环境中都采用了不同的特征和产出。 具体地, 最高级的 RAD 模型中, 的每个预估的当地特征, 以及内部的RHAHD 方法, 的预估地,, 以及内部的模型中, 和内部的模型, 和内部的模型的模型,应该的模型, 和内部的模型, 和内部的模型, 以及内部的模型的模型, 和内部的模型, 和内部的模型, 都的模型,, 和内部的模型, 和内部的模型, 都显示。