In this paper, we propose a novel graph convolutional network architecture, Graph Stacked Hourglass Networks, for 2D-to-3D human pose estimation tasks. The proposed architecture consists of repeated encoder-decoder, in which graph-structured features are processed across three different scales of human skeletal representations. This multi-scale architecture enables the model to learn both local and global feature representations, which are critical for 3D human pose estimation. We also introduce a multi-level feature learning approach using different-depth intermediate features and show the performance improvements that result from exploiting multi-scale, multi-level feature representations. Extensive experiments are conducted to validate our approach, and the results show that our model outperforms the state-of-the-art.
翻译:在本文中,我们建议为2D-3D人构成的估算任务建立一个新型的图形革命网络结构,即石墨沙漏网络。拟议结构由反复的编码器-解码器组成,其中通过三种不同的人体骨骼表层处理图形结构特征。这一多尺度结构使模型既能学习地方特征表象,也能学习全球特征表象,这对3D人构成估计至关重要。我们还采用多层次特征学习方法,采用不同深度的中间特征,并显示利用多层次多层次特征表象所产生的性能改进。进行了广泛的实验,以验证我们的方法,结果显示我们的模型超越了最新技术。