The ability to estimate the 3D human shape and pose from images can be useful in many contexts. Recent approaches have explored using graph convolutional networks and achieved promising results. The fact that the 3D shape is represented by a mesh, an undirected graph, makes graph convolutional networks a natural fit for this problem. However, graph convolutional networks have limited representation power. Information from nodes in the graph is passed to connected neighbors, and propagation of information requires successive graph convolutions. To overcome this limitation, we propose a dual-scale graph approach. We use a coarse graph, derived from a dense graph, to estimate the human's 3D pose, and the dense graph to estimate the 3D shape. Information in coarse graphs can be propagated over longer distances compared to dense graphs. In addition, information about pose can guide to recover local shape detail and vice versa. We recognize that the connection between coarse and dense is itself a graph, and introduce graph fusion blocks to exchange information between graphs with different scales. We train our model end-to-end and show that we can achieve state-of-the-art results for several evaluation datasets.
翻译:从图像中估计 3D 人形和外形的能力在许多情况下是有用的。 最近的方法已经利用图形变形网络进行了探索,并取得了有希望的结果。 3D 形状由不方向的图象表示, 使图形变形网络自然适合这一问题。 然而, 图形变形网络的显示力有限。 图表中的节点信息传递到相连接的邻居, 信息传播需要连续的图形变形。 为了克服这一限制, 我们建议了一种双尺度的图形方法。 我们用一个粗的图表来估计 3D 形状, 和密度的图表来估计 3D 形状。 粗图中的信息可以比稠密的图形在更远的距离上传播。 此外, 有关图形形的信息可以指导恢复本地形状的细节, 而反之, 我们确认, 粗和稠密之间的连接本身就是一张图形, 并引入图形组合块来交换不同尺度的图表。 我们训练我们的模型端到尾部, 并显示我们能够实现一些评估的状态结果。