Bottom-up approaches for image-based multi-person pose estimation consist of two stages: (1) keypoint detection and (2) grouping of the detected keypoints to form person instances. Current grouping approaches rely on learned embedding from only visual features that completely ignore the spatial configuration of human poses. In this work, we formulate the grouping task as a graph partitioning problem, where we learn the affinity matrix with a Graph Neural Network (GNN). More specifically, we design a Geometry-aware Association GNN that utilizes spatial information of the keypoints and learns local affinity from the global context. The learned geometry-based affinity is further fused with appearance-based affinity to achieve robust keypoint association. Spectral clustering is used to partition the graph for the formation of the pose instances. Experimental results on two benchmark datasets show that our proposed method outperforms existing appearance-only grouping frameworks, which shows the effectiveness of utilizing spatial context for robust grouping. Source code is available at: https://github.com/jiahaoLjh/PoseGrouping.
翻译:以图像为基础的多人自下而上的估算方法由两个阶段组成:(1) 关键点检测和(2) 将检测到的键点分组形成人的事件。当前组合方法依赖于从完全忽略人造空间配置的视觉特征中学习的嵌入。在这项工作中,我们将分组任务作为一个图形分割问题,我们用图形神经网络(GNN)来学习亲近矩阵矩阵。更具体地说,我们设计了一个几何觉协会GNN,它利用关键点的空间信息并从全球背景中学习当地亲近性。基于几何测量的亲近性与基于外观的亲近性进一步结合,以建立稳健的键点关联。光谱组合用于分割形成图形形场的图形。两个基准数据集的实验结果显示,我们拟议的方法超越了现有只显示外观的组合框架,这表明了利用空间环境进行稳健组合的有效性。源代码见: https://github.com/jihaoLj/Pose Grooming。