The graph convolutional network (GCN) has been applied to 3D human pose estimation (HPE). In addition, the pure transformer model recently shows promising results in the video-based method. However, the single-frame method still needs to model the physically connected relations among joints because the feature representation transformed only by global attention lack the relationships of the human skeleton. To deal with this problem, we propose a novel architecture, namely AMPose, to combine the physically connected and global relations among joints in the human skeleton towards human pose estimation. The effectiveness of our proposed method is demonstrated through evaluation on Human3.6M dataset. Our model also shows better generalization ability by cross-dataset comparison on MPI-INF-3DHP.
翻译:此外,纯变压器模型最近显示了基于视频的方法的可喜结果。然而,单一框架方法仍然需要建模联合体之间的有形关系,因为仅仅通过全球关注转变的特征代表缺乏人类骨骼的关系。为了解决这一问题,我们建议建立一个新型结构,即AMPose,将人体骨骼上的连接物与全球关系结合起来,以进行人体构成估计。我们拟议方法的有效性通过对人造3.6M数据集的评估得到证明。我们的模式还表明通过对MPI-INF-3DHP的交叉数据集比较,更能概括化。