Graph convolutional networks have significantly improved 3D human pose estimation by representing the human skeleton as an undirected graph. However, this representation fails to reflect the articulated characteristic of human skeletons as the hierarchical orders among the joints are not explicitly presented. In this paper, we propose to represent the human skeleton as a directed graph with the joints as nodes and bones as edges that are directed from parent joints to child joints. By so doing, the directions of edges can explicitly reflect the hierarchical relationships among the nodes. Based on this representation, we adopt the spatial-temporal directed graph convolution (ST-DGConv) to extract features from 2D poses represented in a temporal sequence of directed graphs. We further propose a spatial-temporal conditional directed graph convolution (ST-CondDGConv) to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses. Altogether, we form a U-shaped network with ST-DGConv and ST-CondDGConv layers, named U-shaped Conditional Directed Graph Convolutional Network (U-CondDGCN), for 3D human pose estimation from monocular videos. To evaluate the effectiveness of our U-CondDGCN, we conducted extensive experiments on two challenging large-scale benchmarks: Human3.6M and MPI-INF-3DHP. Both quantitative and qualitative results show that our method achieves top performance. Also, ablation studies show that directed graphs can better exploit the hierarchy of articulated human skeletons than undirected graphs, and the conditional connections can yield adaptive graph topologies for different kinds of poses.
翻译:通过将人类骨骼作为无方向图表来代表, 3D人类构成估计大大改进了3D人类构成。 但是, 这个表达方式未能反映人类骨骼的清晰特征, 因为未明确显示联合之间的等级顺序。 在本文中, 我们提议将人体骨骼作为直线图形作为直线图, 以连接为方向, 将结点和骨骼作为直线图的边缘。 这样, 边缘方向可以明确反映节点之间的等级关系。 基于此表达方式, 我们采用了空间- 时间方向图形结构( ST- DG Convon) 来提取2D构成的特征, 以直线图形结构为代表的时序顺序。 我们进一步提议将空间- 时间方向图形组合( ST- CondDG Convon) 作为直线图形图表( ST- DGDG Convon) 的直观特征, 以空间- DG 直径直线图形结构图显示3DF 的直径直径直径直径关系, 3DF 直径直径直径直径直径直径直径直径直径直径, 直径直径直径直径直地显示我们人类的人类的人类- 直径直径直的人类的直直直直径直径直直直直的直直的直的直直直直径直径直径直的直的直直直的直径向直距距距距距, 。