Various deep learning techniques have been proposed to solve the single-view 2D-to-3D pose estimation problem. While the average prediction accuracy has been improved significantly over the years, the performance on hard poses with depth ambiguity, self-occlusion, and complex or rare poses is still far from satisfactory. In this work, we target these hard poses and present a novel skeletal GNN learning solution. To be specific, we propose a hop-aware hierarchical channel-squeezing fusion layer to effectively extract relevant information from neighboring nodes while suppressing undesired noises in GNN learning. In addition, we propose a temporal-aware dynamic graph construction procedure that is robust and effective for 3D pose estimation. Experimental results on the Human3.6M dataset show that our solution achieves 10.3\% average prediction accuracy improvement and greatly improves on hard poses over state-of-the-art techniques. We further apply the proposed technique on the skeleton-based action recognition task and also achieve state-of-the-art performance. Our code is available at https://github.com/ailingzengzzz/Skeletal-GNN.
翻译:提出了各种深层次的学习技术,以解决单视 2D-3D 带来的估计问题。虽然平均预测准确度多年来有了显著改善,但硬成形的性能仍远不能令人满意,深度模糊、自我封闭、复杂或稀有的性能。在这项工作中,我们针对这些硬成形,并提出了一个新的骨骼 GNN 学习解决方案。具体地说,我们建议采用一个跳觉级级通道隔热聚变层,以便从邻近节点有效提取相关信息,同时抑制GNN 学习中不受欢迎的噪音。此外,我们提议采用一个时间觉动态图构建程序,该程序对于3D 显示的估算是稳健和有效的。 人类3. 6M 数据集的实验结果显示,我们的解决方案实现了10.3 ⁇ 平均预测准确性改进,并大大改进了硬成形的状态技术。我们进一步应用了基于骨架的行动识别任务的拟议技术,并实现了艺术状态性能。我们的代码可在 https://github.com/ailingzzz/Skeleal-GNNN 上查阅。