In this work, we propose a framework for single-view hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence. Specifically, for 2D encoding, we propose lightweight yet effective stacked structures. Regarding 3D decoding, we provide an efficient graph operator, namely depth-separable spiral convolution. Moreover, we present a novel feature lifting module for bridging the gap between 2D and 3D representations. This module starts with a map-based position regression (MapReg) block to integrate the merits of both heatmap encoding and position regression paradigms to improve 2D accuracy and temporal coherence. Furthermore, MapReg is followed by pose pooling and pose-to-vertex lifting approaches, which transform 2D pose encodings to semantic features of 3D vertices. Overall, our hand reconstruction framework, called MobRecon, comprises affordable computational costs and miniature model size, which reaches a high inference speed of 83FPS on Apple A14 CPU. Extensive experiments on popular datasets such as FreiHAND, RHD, and HO3Dv2 demonstrate that our MobRecon achieves superior performance on reconstruction accuracy and temporal coherence. Our code is publicly available at https://github.com/SeanChenxy/HandMesh.
翻译:在这项工作中,我们提出了一个单视手网格重建框架,它可以同时实现高重建精度、快速推断速度和时间一致性。具体来说,对于 2D 编码,我们建议轻量但有效的堆叠结构。关于 3D 解码,我们提供一个高效的图形操作器,即深度分离螺旋变动。此外,我们提出了一个新的增强功能模块,以缩小2D和3D代表之间的距离。这个模块以基于地图的位置回归(MapReg)块开始,以整合热映编码和位置回归模式的优点,以提高 2D 精确度和时间一致性。此外,在地图后,我们采用组合和摆放向垂直升动的方法,将 2D 转换成3D 脊椎的语义特征。总体而言,我们称为 MobRecon 的手重建框架,由负担得起的计算成本和微型模型大小组成,在苹果 A14 CPU 上达到83FPS CPU的高参考速度。关于大众数据设置的广泛实验,如FreHAND、RHD和HE/HEANS Tealimality2 显示我们在OB/OB/OB/TERBS 的精确性。