Recent 2D-to-3D human pose estimation works tend to utilize the graph structure formed by the topology of the human skeleton. However, we argue that this skeletal topology is too sparse to reflect the body structure and suffer from serious 2D-to-3D ambiguity problem. To overcome these weaknesses, we propose a novel graph convolution network architecture, Hierarchical Graph Networks (HGN). It is based on denser graph topology generated by our multi-scale graph structure building strategy, thus providing more delicate geometric information. The proposed architecture contains three sparse-to-fine representation subnetworks organized in parallel, in which multi-scale graph-structured features are processed and exchange information through a novel feature fusion strategy, leading to rich hierarchical representations. We also introduce a 3D coarse mesh constraint to further boost detail-related feature learning. Extensive experiments demonstrate that our HGN achieves the state-of-the art performance with reduced network parameters
翻译:最近2D到3D人类表面估计工程往往使用由人类骨骼的表层构成的图表结构。 然而,我们认为,这一骨骼结构过于稀少,无法反映人体结构,并且存在严重的2D到3D模糊问题。为了克服这些弱点,我们提议建立一个新的图形变形网络结构,即高层次图图图网络。它基于我们多尺度图结构结构构建战略产生的密度图形表层,从而提供更微妙的几何信息。拟议的结构包括三个平行组织的稀疏到纤维代表子网络,其中处理多尺度图结构特征,并通过新颖的特征融合战略交流信息,导致丰富的等级代表性。我们还引入了3D粗微网块限制,以进一步推动与细节有关的特征学习。广泛的实验表明,我们的图组在网络参数减少的情况下实现了艺术状态。