Most existing animal pose and shape estimation approaches reconstruct animal meshes with a parametric SMAL model. This is because the low-dimensional pose and shape parameters of the SMAL model makes it easier for deep networks to learn the high-dimensional animal meshes. However, the SMAL model is learned from scans of toy animals with limited pose and shape variations, and thus may not be able to represent highly varying real animals well. This may result in poor fittings of the estimated meshes to the 2D evidences, e.g. 2D keypoints or silhouettes. To mitigate this problem, we propose a coarse-to-fine approach to reconstruct 3D animal mesh from a single image. The coarse estimation stage first estimates the pose, shape and translation parameters of the SMAL model. The estimated meshes are then used as a starting point by a graph convolutional network (GCN) to predict a per-vertex deformation in the refinement stage. This combination of SMAL-based and vertex-based representations benefits from both parametric and non-parametric representations. We design our mesh refinement GCN (MRGCN) as an encoder-decoder structure with hierarchical feature representations to overcome the limited receptive field of traditional GCNs. Moreover, we observe that the global image feature used by existing animal mesh reconstruction works is unable to capture detailed shape information for mesh refinement. We thus introduce a local feature extractor to retrieve a vertex-level feature and use it together with the global feature as the input of the MRGCN. We test our approach on the StanfordExtra dataset and achieve state-of-the-art results. Furthermore, we test the generalization capacity of our approach on the Animal Pose and BADJA datasets. Our code is available at the project website.
翻译:多数现有动物的外形和形状估计方法,用一个参数 SMAL 模型来重建动物的外壳。这是因为SMAL 模型的低维面面貌和形状参数使深网络更容易学习高维动物的外壳。然而,SMAL 模型是从毛形和形状变化有限的玩具动物扫描中学习的,因此可能无法代表高度不同的真实动物。这可能导致估计的外壳与2D证据的配对性差,例如2D 键点或双光板。为了缓解这一问题,我们建议一种从单一图像中重建 3D 动物的外形和形状参数。 粗粗的估算阶段首先估计SMAL 模型的容貌、形状和翻译参数。 估计的外壳作为起点,在精细化的阶段,SMAL 基和基于脊椎的表达方式的结合,从对准和不精确的内置的内置的内置的内置和内置的内置的内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-我们的内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内置-内-内-内-内-内-内-内-内置-内-内-内-内置-内-内置-我们-内-内-内-内置-内置-