Non-parametric mesh reconstruction has recently shown significant progress in 3D hand and body applications. In these methods, mesh vertices and edges are visible to neural networks, enabling the possibility to establish a direct mapping between 2D image pixels and 3D mesh vertices. In this paper, we seek to establish and exploit this mapping with a simple and compact architecture. The network is designed with these considerations: 1) aggregating both local 2D image features from the encoder and 3D geometric features captured in the mesh decoder; 2) decoding coarse-to-fine meshes along the decoding layers to make the best use of the hierarchical multi-scale information. Specifically, we propose an end-to-end pipeline for hand mesh recovery tasks which consists of three phases: a 2D feature extractor constructing multi-scale feature maps, a feature mapping module transforming local 2D image features to 3D vertex features via 3D-to-2D projection, and a mesh decoder combining the graph convolution and self-attention to reconstruct mesh. The decoder aggregate both local image features in pixels and geometric features in vertices. It also regresses the mesh vertices in a coarse-to-fine manner, which can leverage multi-scale information. By exploiting the local connection and designing the mesh decoder, Our approach achieves state-of-the-art for hand mesh reconstruction on the public FreiHAND dataset.
翻译:非参数网状重建最近在3D 手和体应用程序中显示出显著的进展。 在这些方法中,神经网络可以看到网状脊椎和边缘,从而有可能在 2D 图像像素和 3D 网状脊椎之间建立直接映像。 在本文中,我们试图用简单和紧凑的架构建立和利用这一映像图。 网络的设计基于这些考虑:(1) 将3D 图像的本地 2D 图像特征与在网状解码仪中捕获的编码和 3D 几何特征结合起来;(2) 在解码层上解码到线的网状网状网状网状和边缘线状网状网状网状网状,以便最佳利用等级多尺度信息。具体地标网状网状网状网状图状图状由三个阶段组成: 2D 地谱提取器状图状图状图状,通过 3D 到 2D 的图状图状图状图状图状图状和3D 3D 垂直图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状和图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状的图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图状图图状图状图状图状图状图状图图图图图图图图图图图状图状图状图状图状图状图图状图状图图图图图图状图状图状图状图状图状图状图状图状图状图状图状图图图图状图状图状图状图状图